Kevin Meza Landeros

README.md

Showing 1 changed file with 20 additions and 17 deletions
1# MONOGENIC DISEASES
# MONOGENIC DISEASES
## Human Genomics Project
Here we display all the data and scripts used in order to answer one of the most relevant actual questions in the field of Human Health:
**Which is the proportion of diseases that are caused due to afections in coding and non coding regions?**
Team:
- Garcia Flores Fernanda Renee
- Meza Landeros Kevin Emmanuel
- Schafer Juarez Badillo Alejandra Nicole
- Zeferino Garcia Karla
Here we display all the data and scripts used in order to answer one of the most relevant actual questions in the field of Human Health:
**Which is the proportion of diseases that are caused due to afections in coding and non coding regions?**
## Folder content
- Grsphs
- <ins>*Graphs*</ins>
Has plots that show the proportion of coding and non-coding sequences of Monogenic Diseases.
- Grafica1.png
- Grafica2.png
......@@ -21,32 +21,35 @@ Here we display all the data and scripts used in order to answer one of the most
- sequences_aligned_A.sam
- sequences_aligned_A_sort.bam
- <ins>*data*</ins>
- *DISEASES DB*
- <ins>*DISEASES DB*</ins>
Stores one of the databases use for the project, a file that has all the information of the monogenic diseases contained within it.
- human_disease_textmining_full.tsv
- merge_list_monogenic_diseases.tsv (list of genes form "merge_monogenic_diseases.tsv")
- merge_monogenic_diseases.tsv
- *Ensembl*
- <ins>*Ensembl*</ins>
Harbors the following information about human genes: Gene start (bp); Gene end (bp); Gene type; Gene name; Strand; Protein stable ID
- mart_export_v2.txt
- *Homo_sapiens*
- <ins>*Homo_sapiens*</ins>
Includes human genome sequence and it's annotation.
- Homo_sapiens.GRCh38.100.gff3.gz (annotation)
- Homo_sapiens.GRCh38.dna.alt.fa.gz (sequence)
- *OMIM*
- <ins>*OMIM*</ins>
Contains a file that has information about different heritable conditions and that was was filtered to get what corresponds to monogenic diseases.
- gene_filtered_phenENS.txt
- <ins>*scripts*</ins>
Has the scripts that were used through this project.
- CambioCol.R
- ObtencionSecuencias.R
- ObtenciondeAllData.R
- alineamiento.sh
- get_monogenic_disease_data_DISEASES.sh
- get_monogenic_disease_data_OMIM.sh
- mapeo.R
- CambioCol.R `Reordena las columnas del archivo genemap2.txt y produce el archivo genemap2_reorder.txt`
- ObtencionSecuencias.R `Se conecta a Ensembl y obtiene las secuencias de los genes seleccionados (asociados a una enfermedad).`
- ObtenciondeAllData.R `Procesa los archivos de las 2 bases de datos (gene_filtered_phenENS.txt & match_v2.tsv) y la une con la informacion de Ensembl (mart_export_v2.txt). Finalmente genera unarchivo con la información de las 2 bases de datos.`
- alineamiento.sh `Alinea las secuencias de los genes con el genoma de Homo sapiens`
- get_monogenic_disease_data_DISEASES.sh `Obtiene la informacion de enfermedades monogénicas de DISEASE DB (human_disease_textmining_full.tsv).`
- get_monogenic_disease_data_OMIM.sh `Obtiene la informacion de enfermedades monogénicas de OMIM (genemap2.txt)`
- mapeo.R `Usa el alinemaiento (sequences_aligned_sort.bam) y la anotacion del genoma de humano (Homo_sapiens.GRCh38.100.gff3.gz) para obtener la anotacion de los genes de interes`
> Important Notes
The files "genemap2.txt" and "genemap2_reorder.txt" are ommited due to OMIM policy restrictions.
## Results
![Biotipo de los genes que causan enfermedades Mendelianas.]( Graphs/Grafica2.png)
![Biotipo de los genes que causan enfermedades Mendelianas.](Graphs/Grafica2.png)
......