Here we display all the data and scripts used in order to answer one of the most relevant actual questions in the field of Human Health:
**Which is the proportion of diseases that are caused due to afections in coding and non coding regions?**
**Which is the proportion of diseases that are caused due to afections in coding and non coding regions?**
\ No newline at end of file
## Folder content
-*alignments*
Contains the resultant files of aligning the secuences from genes of interest (that cause a monogenis disease) to the human genome.
- sequences_aligned_A.bam
- sequences_aligned_A.sam
- sequences_aligned_A_sort.bam
-*data*
-*DISEASES DB*
Stores one of the databases use for the project and a file that has all the information of the monogenic diseases contained within it.
- human_disease_textmining_full.tsv
- merge_list_monogenic_diseases.tsv
- merge_monogenic_diseases.tsv
-*Ensembl*
Harbors information about human genes.
- mart_export_v2.txt
-*Homo_sapiens*
Includes human genome sequence and it's annotation.
- Homo_sapiens.GRCh38.100.gff3.gz
- Homo_sapiens.GRCh38.dna.alt.fa.gz
-*OMIM*
Contains a file that has information about different heritable conditions, and another file has the information that corresponds to monogenic diseases.
- genemap2.txt
-*scripts*
Has the scripts that were used through this project.