MONOGENIC DISEASES
Human Genomics Project
Here we display all the data and scripts used in order to answer one of the most relevant actual questions in the field of Human Health:
Which is the proportion of diseases that are caused due to afections in coding and non coding regions?
Team:
- Garcia Flores Fernanda Renee
- Meza Landeros Kevin Emmanuel
- Schafer Juarez Badillo Alejandra Nicole
- Zeferino Garcia Karla
Folder content
-
Graphs
Has plots that show the proportion of coding and non-coding sequences of Monogenic Diseases.- Grafica1.png
- Grafica2.png
-
alignments
Contains the resultant files of aligning the secuences from genes of interest (that cause a monogenis disease) to the human genome.- sequences_aligned_A.bam
- sequences_aligned_A.sam
- sequences_aligned_A_sort.bam
-
data
-
DISEASES DB
Stores one of the databases use for the project, a file that has all the information of the monogenic diseases contained within it.- human_disease_textmining_full.tsv
- merge_list_monogenic_diseases.tsv (list of genes form "merge_monogenic_diseases.tsv")
- merge_monogenic_diseases.tsv
-
Ensembl
Harbors the following information about human genes: Gene start (bp); Gene end (bp); Gene type; Gene name; Strand; Protein stable ID- mart_export_v2.txt
-
Homo_sapiens
Includes human genome sequence and it's annotation.- Homo_sapiens.GRCh38.100.gff3.gz (annotation)
- Homo_sapiens.GRCh38.dna.alt.fa.gz (sequence)
-
OMIM
Contains a file that has information about different heritable conditions and that was was filtered to get what corresponds to monogenic diseases.- gene_filtered_phenENS.txt
-
DISEASES DB
-
scripts
Has the scripts that were used through this project.- CambioCol.R
Reordena las columnas del archivo genemap2.txt y produce el archivo genemap2_reorder.txt.
- ObtencionSecuencias.R
Se conecta a Ensembl y obtiene las secuencias de los genes seleccionados (asociados a una enfermedad).
- ObtenciondeAllData.R
Procesa los archivos de las 2 bases de datos (gene_filtered_phenENS.txt & match_v2.tsv) y la une con la informacion de Ensembl (mart_export_v2.txt). Finalmente genera unarchivo con la información de las 2 bases de datos.
- alineamiento.sh
Alinea las secuencias de los genes con el genoma de Homo sapiens.
- get_monogenic_disease_data_DISEASES.sh
Obtiene la informacion de enfermedades monogénicas de DISEASE DB (human_disease_textmining_full.tsv).
- get_monogenic_disease_data_OMIM.sh
Obtiene la informacion de enfermedades monogénicas de OMIM (genemap2.txt).
- mapeo.Rmd/mapeo.html
Usa el alinemaiento (sequences_aligned_sort.bam) y la anotacion del genoma de humano (Homo_sapiens.GRCh38.100.gff3.gz) para obtener la anotacion de los genes de interes.
- CambioCol.R
Important Notes
The files "genemap2.txt" and "genemap2_reorder.txt" are ommited due to OMIM policy restrictions.