Kevin Meza Landeros

README.md

Showing 1 changed file with 20 additions and 17 deletions
1 -1# MONOGENIC DISEASES 1 +# MONOGENIC DISEASES
2 ## Human Genomics Project 2 ## Human Genomics Project
3 3
4 +Here we display all the data and scripts used in order to answer one of the most relevant actual questions in the field of Human Health:
5 +**Which is the proportion of diseases that are caused due to afections in coding and non coding regions?**
6 +
4 Team: 7 Team:
5 - Garcia Flores Fernanda Renee 8 - Garcia Flores Fernanda Renee
6 - Meza Landeros Kevin Emmanuel 9 - Meza Landeros Kevin Emmanuel
7 - Schafer Juarez Badillo Alejandra Nicole 10 - Schafer Juarez Badillo Alejandra Nicole
8 - Zeferino Garcia Karla 11 - Zeferino Garcia Karla
9 12
10 -Here we display all the data and scripts used in order to answer one of the most relevant actual questions in the field of Human Health:
11 -**Which is the proportion of diseases that are caused due to afections in coding and non coding regions?**
12 -
13 ## Folder content 13 ## Folder content
14 -- Grsphs 14 +- <ins>*Graphs*</ins>
15 Has plots that show the proportion of coding and non-coding sequences of Monogenic Diseases. 15 Has plots that show the proportion of coding and non-coding sequences of Monogenic Diseases.
16 - Grafica1.png 16 - Grafica1.png
17 - Grafica2.png 17 - Grafica2.png
...@@ -21,32 +21,35 @@ Here we display all the data and scripts used in order to answer one of the most ...@@ -21,32 +21,35 @@ Here we display all the data and scripts used in order to answer one of the most
21 - sequences_aligned_A.sam 21 - sequences_aligned_A.sam
22 - sequences_aligned_A_sort.bam 22 - sequences_aligned_A_sort.bam
23 - <ins>*data*</ins> 23 - <ins>*data*</ins>
24 - - *DISEASES DB* 24 + - <ins>*DISEASES DB*</ins>
25 Stores one of the databases use for the project, a file that has all the information of the monogenic diseases contained within it. 25 Stores one of the databases use for the project, a file that has all the information of the monogenic diseases contained within it.
26 - human_disease_textmining_full.tsv 26 - human_disease_textmining_full.tsv
27 - merge_list_monogenic_diseases.tsv (list of genes form "merge_monogenic_diseases.tsv") 27 - merge_list_monogenic_diseases.tsv (list of genes form "merge_monogenic_diseases.tsv")
28 - merge_monogenic_diseases.tsv 28 - merge_monogenic_diseases.tsv
29 - - *Ensembl* 29 + - <ins>*Ensembl*</ins>
30 Harbors the following information about human genes: Gene start (bp); Gene end (bp); Gene type; Gene name; Strand; Protein stable ID 30 Harbors the following information about human genes: Gene start (bp); Gene end (bp); Gene type; Gene name; Strand; Protein stable ID
31 - mart_export_v2.txt 31 - mart_export_v2.txt
32 - - *Homo_sapiens* 32 + - <ins>*Homo_sapiens*</ins>
33 Includes human genome sequence and it's annotation. 33 Includes human genome sequence and it's annotation.
34 - Homo_sapiens.GRCh38.100.gff3.gz (annotation) 34 - Homo_sapiens.GRCh38.100.gff3.gz (annotation)
35 - Homo_sapiens.GRCh38.dna.alt.fa.gz (sequence) 35 - Homo_sapiens.GRCh38.dna.alt.fa.gz (sequence)
36 - - *OMIM* 36 + - <ins>*OMIM*</ins>
37 Contains a file that has information about different heritable conditions and that was was filtered to get what corresponds to monogenic diseases. 37 Contains a file that has information about different heritable conditions and that was was filtered to get what corresponds to monogenic diseases.
38 - gene_filtered_phenENS.txt 38 - gene_filtered_phenENS.txt
39 - <ins>*scripts*</ins> 39 - <ins>*scripts*</ins>
40 Has the scripts that were used through this project. 40 Has the scripts that were used through this project.
41 - - CambioCol.R 41 + - CambioCol.R `Reordena las columnas del archivo genemap2.txt y produce el archivo genemap2_reorder.txt`
42 - - ObtencionSecuencias.R 42 + - ObtencionSecuencias.R `Se conecta a Ensembl y obtiene las secuencias de los genes seleccionados (asociados a una enfermedad).`
43 - - ObtenciondeAllData.R 43 + - ObtenciondeAllData.R `Procesa los archivos de las 2 bases de datos (gene_filtered_phenENS.txt & match_v2.tsv) y la une con la informacion de Ensembl (mart_export_v2.txt). Finalmente genera unarchivo con la información de las 2 bases de datos.`
44 - - alineamiento.sh 44 + - alineamiento.sh `Alinea las secuencias de los genes con el genoma de Homo sapiens`
45 - - get_monogenic_disease_data_DISEASES.sh 45 + - get_monogenic_disease_data_DISEASES.sh `Obtiene la informacion de enfermedades monogénicas de DISEASE DB (human_disease_textmining_full.tsv).`
46 - - get_monogenic_disease_data_OMIM.sh 46 + - get_monogenic_disease_data_OMIM.sh `Obtiene la informacion de enfermedades monogénicas de OMIM (genemap2.txt)`
47 - - mapeo.R 47 + - mapeo.R `Usa el alinemaiento (sequences_aligned_sort.bam) y la anotacion del genoma de humano (Homo_sapiens.GRCh38.100.gff3.gz) para obtener la anotacion de los genes de interes`
48 48
49 +> Important Notes
50 +
51 +The files "genemap2.txt" and "genemap2_reorder.txt" are ommited due to OMIM policy restrictions.
49 52
50 ## Results 53 ## Results
51 54
52 -![Biotipo de los genes que causan enfermedades Mendelianas.]( Graphs/Grafica2.png) 55 +![Biotipo de los genes que causan enfermedades Mendelianas.](Graphs/Grafica2.png)
......