Showing
1 changed file
with
20 additions
and
17 deletions
1 | -1# MONOGENIC DISEASES | 1 | +# MONOGENIC DISEASES |
2 | ## Human Genomics Project | 2 | ## Human Genomics Project |
3 | 3 | ||
4 | +Here we display all the data and scripts used in order to answer one of the most relevant actual questions in the field of Human Health: | ||
5 | +**Which is the proportion of diseases that are caused due to afections in coding and non coding regions?** | ||
6 | + | ||
4 | Team: | 7 | Team: |
5 | - Garcia Flores Fernanda Renee | 8 | - Garcia Flores Fernanda Renee |
6 | - Meza Landeros Kevin Emmanuel | 9 | - Meza Landeros Kevin Emmanuel |
7 | - Schafer Juarez Badillo Alejandra Nicole | 10 | - Schafer Juarez Badillo Alejandra Nicole |
8 | - Zeferino Garcia Karla | 11 | - Zeferino Garcia Karla |
9 | 12 | ||
10 | -Here we display all the data and scripts used in order to answer one of the most relevant actual questions in the field of Human Health: | ||
11 | -**Which is the proportion of diseases that are caused due to afections in coding and non coding regions?** | ||
12 | - | ||
13 | ## Folder content | 13 | ## Folder content |
14 | -- Grsphs | 14 | +- <ins>*Graphs*</ins> |
15 | Has plots that show the proportion of coding and non-coding sequences of Monogenic Diseases. | 15 | Has plots that show the proportion of coding and non-coding sequences of Monogenic Diseases. |
16 | - Grafica1.png | 16 | - Grafica1.png |
17 | - Grafica2.png | 17 | - Grafica2.png |
... | @@ -21,32 +21,35 @@ Here we display all the data and scripts used in order to answer one of the most | ... | @@ -21,32 +21,35 @@ Here we display all the data and scripts used in order to answer one of the most |
21 | - sequences_aligned_A.sam | 21 | - sequences_aligned_A.sam |
22 | - sequences_aligned_A_sort.bam | 22 | - sequences_aligned_A_sort.bam |
23 | - <ins>*data*</ins> | 23 | - <ins>*data*</ins> |
24 | - - *DISEASES DB* | 24 | + - <ins>*DISEASES DB*</ins> |
25 | Stores one of the databases use for the project, a file that has all the information of the monogenic diseases contained within it. | 25 | Stores one of the databases use for the project, a file that has all the information of the monogenic diseases contained within it. |
26 | - human_disease_textmining_full.tsv | 26 | - human_disease_textmining_full.tsv |
27 | - merge_list_monogenic_diseases.tsv (list of genes form "merge_monogenic_diseases.tsv") | 27 | - merge_list_monogenic_diseases.tsv (list of genes form "merge_monogenic_diseases.tsv") |
28 | - merge_monogenic_diseases.tsv | 28 | - merge_monogenic_diseases.tsv |
29 | - - *Ensembl* | 29 | + - <ins>*Ensembl*</ins> |
30 | Harbors the following information about human genes: Gene start (bp); Gene end (bp); Gene type; Gene name; Strand; Protein stable ID | 30 | Harbors the following information about human genes: Gene start (bp); Gene end (bp); Gene type; Gene name; Strand; Protein stable ID |
31 | - mart_export_v2.txt | 31 | - mart_export_v2.txt |
32 | - - *Homo_sapiens* | 32 | + - <ins>*Homo_sapiens*</ins> |
33 | Includes human genome sequence and it's annotation. | 33 | Includes human genome sequence and it's annotation. |
34 | - Homo_sapiens.GRCh38.100.gff3.gz (annotation) | 34 | - Homo_sapiens.GRCh38.100.gff3.gz (annotation) |
35 | - Homo_sapiens.GRCh38.dna.alt.fa.gz (sequence) | 35 | - Homo_sapiens.GRCh38.dna.alt.fa.gz (sequence) |
36 | - - *OMIM* | 36 | + - <ins>*OMIM*</ins> |
37 | Contains a file that has information about different heritable conditions and that was was filtered to get what corresponds to monogenic diseases. | 37 | Contains a file that has information about different heritable conditions and that was was filtered to get what corresponds to monogenic diseases. |
38 | - gene_filtered_phenENS.txt | 38 | - gene_filtered_phenENS.txt |
39 | - <ins>*scripts*</ins> | 39 | - <ins>*scripts*</ins> |
40 | Has the scripts that were used through this project. | 40 | Has the scripts that were used through this project. |
41 | - - CambioCol.R | 41 | + - CambioCol.R `Reordena las columnas del archivo genemap2.txt y produce el archivo genemap2_reorder.txt` |
42 | - - ObtencionSecuencias.R | 42 | + - ObtencionSecuencias.R `Se conecta a Ensembl y obtiene las secuencias de los genes seleccionados (asociados a una enfermedad).` |
43 | - - ObtenciondeAllData.R | 43 | + - ObtenciondeAllData.R `Procesa los archivos de las 2 bases de datos (gene_filtered_phenENS.txt & match_v2.tsv) y la une con la informacion de Ensembl (mart_export_v2.txt). Finalmente genera unarchivo con la información de las 2 bases de datos.` |
44 | - - alineamiento.sh | 44 | + - alineamiento.sh `Alinea las secuencias de los genes con el genoma de Homo sapiens` |
45 | - - get_monogenic_disease_data_DISEASES.sh | 45 | + - get_monogenic_disease_data_DISEASES.sh `Obtiene la informacion de enfermedades monogénicas de DISEASE DB (human_disease_textmining_full.tsv).` |
46 | - - get_monogenic_disease_data_OMIM.sh | 46 | + - get_monogenic_disease_data_OMIM.sh `Obtiene la informacion de enfermedades monogénicas de OMIM (genemap2.txt)` |
47 | - - mapeo.R | 47 | + - mapeo.R `Usa el alinemaiento (sequences_aligned_sort.bam) y la anotacion del genoma de humano (Homo_sapiens.GRCh38.100.gff3.gz) para obtener la anotacion de los genes de interes` |
48 | 48 | ||
49 | +> Important Notes | ||
50 | + | ||
51 | +The files "genemap2.txt" and "genemap2_reorder.txt" are ommited due to OMIM policy restrictions. | ||
49 | 52 | ||
50 | ## Results | 53 | ## Results |
51 | 54 | ||
52 | - | 55 | + | ... | ... |
-
Please register or login to post a comment