Here we display all the data and scripts used in order to answer one of the most relevant actual questions in the field of Human Health:
**Which is the proportion of diseases that are caused due to afections in coding and non coding regions?**
Team:
- Garcia Flores Fernanda Renee
- Meza Landeros Kevin Emmanuel
- Schafer Juarez Badillo Alejandra Nicole
- Zeferino Garcia Karla
Here we display all the data and scripts used in order to answer one of the most relevant actual questions in the field of Human Health:
**Which is the proportion of diseases that are caused due to afections in coding and non coding regions?**
## Folder content
-Grsphs
-<ins>*Graphs*</ins>
Has plots that show the proportion of coding and non-coding sequences of Monogenic Diseases.
- Grafica1.png
- Grafica2.png
...
...
@@ -21,32 +21,35 @@ Here we display all the data and scripts used in order to answer one of the most
- sequences_aligned_A.sam
- sequences_aligned_A_sort.bam
-<ins>*data*</ins>
-*DISEASES DB*
-<ins>*DISEASES DB*</ins>
Stores one of the databases use for the project, a file that has all the information of the monogenic diseases contained within it.
- human_disease_textmining_full.tsv
- merge_list_monogenic_diseases.tsv (list of genes form "merge_monogenic_diseases.tsv")
- merge_monogenic_diseases.tsv
-*Ensembl*
-<ins>*Ensembl*</ins>
Harbors the following information about human genes: Gene start (bp); Gene end (bp); Gene type; Gene name; Strand; Protein stable ID
- mart_export_v2.txt
-*Homo_sapiens*
-<ins>*Homo_sapiens*</ins>
Includes human genome sequence and it's annotation.
- Homo_sapiens.GRCh38.100.gff3.gz (annotation)
- Homo_sapiens.GRCh38.dna.alt.fa.gz (sequence)
-*OMIM*
-<ins>*OMIM*</ins>
Contains a file that has information about different heritable conditions and that was was filtered to get what corresponds to monogenic diseases.
- gene_filtered_phenENS.txt
-<ins>*scripts*</ins>
Has the scripts that were used through this project.
- CambioCol.R
- ObtencionSecuencias.R
- ObtenciondeAllData.R
- alineamiento.sh
- get_monogenic_disease_data_DISEASES.sh
- get_monogenic_disease_data_OMIM.sh
- mapeo.R
- CambioCol.R`Reordena las columnas del archivo genemap2.txt y produce el archivo genemap2_reorder.txt`
- ObtencionSecuencias.R`Se conecta a Ensembl y obtiene las secuencias de los genes seleccionados (asociados a una enfermedad).`
- ObtenciondeAllData.R`Procesa los archivos de las 2 bases de datos (gene_filtered_phenENS.txt & match_v2.tsv) y la une con la informacion de Ensembl (mart_export_v2.txt). Finalmente genera unarchivo con la información de las 2 bases de datos.`
- alineamiento.sh`Alinea las secuencias de los genes con el genoma de Homo sapiens`
- get_monogenic_disease_data_DISEASES.sh`Obtiene la informacion de enfermedades monogénicas de DISEASE DB (human_disease_textmining_full.tsv).`
- get_monogenic_disease_data_OMIM.sh`Obtiene la informacion de enfermedades monogénicas de OMIM (genemap2.txt)`
- mapeo.R`Usa el alinemaiento (sequences_aligned_sort.bam) y la anotacion del genoma de humano (Homo_sapiens.GRCh38.100.gff3.gz) para obtener la anotacion de los genes de interes`
> Important Notes
The files "genemap2.txt" and "genemap2_reorder.txt" are ommited due to OMIM policy restrictions.
## Results

