Name Last Update
Graphs Loading commit data...
alignments Loading commit data...
data Loading commit data...
scripts Loading commit data...
README.md Loading commit data...

MONOGENIC DISEASES

Human Genomics Project

Here we display all the data and scripts used in order to answer one of the most relevant actual questions in the field of Human Health:
Which is the proportion of diseases that are caused due to afections in coding and non coding regions?

Team:

  • Garcia Flores Fernanda Renee
  • Meza Landeros Kevin Emmanuel
  • Schafer Juarez Badillo Alejandra Nicole
  • Zeferino Garcia Karla

Folder content

  • Graphs
    Has plots that show the proportion of coding and non-coding sequences of Monogenic Diseases.
    • Grafica1.png
    • Grafica2.png
  • alignments
    Contains the resultant files of aligning the secuences from genes of interest (that cause a monogenis disease) to the human genome.
    • sequences_aligned_A.bam
    • sequences_aligned_A.sam
    • sequences_aligned_A_sort.bam
  • data
    • DISEASES DB
      Stores one of the databases use for the project, a file that has all the information of the monogenic diseases contained within it.
      • human_disease_textmining_full.tsv
      • merge_list_monogenic_diseases.tsv (list of genes form "merge_monogenic_diseases.tsv")
      • merge_monogenic_diseases.tsv
    • Ensembl
      Harbors the following information about human genes: Gene start (bp); Gene end (bp); Gene type; Gene name; Strand; Protein stable ID
      • mart_export_v2.txt
    • Homo_sapiens
      Includes human genome sequence and it's annotation.
      • Homo_sapiens.GRCh38.100.gff3.gz (annotation)
      • Homo_sapiens.GRCh38.dna.alt.fa.gz (sequence)
    • OMIM
      Contains a file that has information about different heritable conditions and that was was filtered to get what corresponds to monogenic diseases.
      • gene_filtered_phenENS.txt
  • scripts
    Has the scripts that were used through this project.
    • CambioCol.R Reordena las columnas del archivo genemap2.txt y produce el archivo genemap2_reorder.txt.
    • ObtencionSecuencias.R Se conecta a Ensembl y obtiene las secuencias de los genes seleccionados (asociados a una enfermedad).
    • ObtenciondeAllData.R Procesa los archivos de las 2 bases de datos (gene_filtered_phenENS.txt & match_v2.tsv) y la une con la informacion de Ensembl (mart_export_v2.txt). Finalmente genera unarchivo con la información de las 2 bases de datos.
    • alineamiento.sh Alinea las secuencias de los genes con el genoma de Homo sapiens.
    • get_monogenic_disease_data_DISEASES.sh Obtiene la informacion de enfermedades monogénicas de DISEASE DB (human_disease_textmining_full.tsv).
    • get_monogenic_disease_data_OMIM.sh Obtiene la informacion de enfermedades monogénicas de OMIM (genemap2.txt).
    • mapeo.Rmd/mapeo.html Usa el alinemaiento (sequences_aligned_sort.bam) y la anotacion del genoma de humano (Homo_sapiens.GRCh38.100.gff3.gz) para obtener la anotacion de los genes de interes.

Important Notes

The files "genemap2.txt" and "genemap2_reorder.txt" are ommited due to OMIM policy restrictions.

Results

Biotipo de los genes que causan enfermedades Mendelianas.