README.md

Kevin Meza Landeros
Commit 0f5180acb34c23cd76c146b1f0fa815fa95b42eb 0f5180ac 1 parent a1220d95
Showing 1 changed file with 20 additions and 17 deletions
README.md
--- a/README.md
View file @0f5180a
+++ b/README.md
View file @0f5180a
-1# MONOGENIC DISEASES
+# MONOGENIC DISEASES
 ## Human Genomics Project
+Here we display all the data and scripts used in order to answer one of the most relevant actual questions in the field of Human Health:  
+**Which is the proportion of diseases that are caused due to afections in coding and non coding regions?**
+
 Team:
 - Garcia Flores Fernanda Renee
 - Meza Landeros Kevin Emmanuel
 - Schafer Juarez Badillo Alejandra Nicole
 - Zeferino Garcia Karla 
-Here we display all the data and scripts used in order to answer one of the most relevant actual questions in the field of Human Health:  
-**Which is the proportion of diseases that are caused due to afections in coding and non coding regions?**  
-
 ## Folder content  
-- Grsphs
+- <ins>*Graphs*</ins>  
     Has plots that show the proportion of coding and non-coding sequences of Monogenic Diseases.
     - Grafica1.png
     - Grafica2.png
@@ -21,32 +21,35 @@ Here we display all the data and scripts used in order to answer one of the most
     - sequences_aligned_A.sam
     - sequences_aligned_A_sort.bam
 - <ins>*data*</ins>  
-    - *DISEASES DB*  
+    - <ins>*DISEASES DB*</ins>  
         Stores one of the databases use for the project, a file that has all the information of the monogenic diseases contained within it.
         - human_disease_textmining_full.tsv 
         - merge_list_monogenic_diseases.tsv (list of genes form "merge_monogenic_diseases.tsv")
         - merge_monogenic_diseases.tsv
-    - *Ensembl*  
+    - <ins>*Ensembl*</ins>  
         Harbors the following information about human genes: Gene start (bp); Gene end (bp); Gene type; Gene name; Strand; Protein stable ID
         - mart_export_v2.txt
-    - *Homo_sapiens*  
+    - <ins>*Homo_sapiens*</ins>  
         Includes human genome sequence and it's annotation.
         - Homo_sapiens.GRCh38.100.gff3.gz (annotation)
         - Homo_sapiens.GRCh38.dna.alt.fa.gz (sequence)
-    - *OMIM*  
+    - <ins>*OMIM*</ins>  
         Contains a file that has information about different heritable conditions and that was was filtered to get what corresponds to monogenic diseases.
         - gene_filtered_phenENS.txt 
 - <ins>*scripts*</ins>  
     Has the scripts that were used through this project.
-    - CambioCol.R
+    - CambioCol.R `Reordena las columnas del archivo genemap2.txt y produce el archivo genemap2_reorder.txt`
-    - ObtencionSecuencias.R
+    - ObtencionSecuencias.R `Se conecta a Ensembl y obtiene las secuencias de los genes seleccionados (asociados a una enfermedad).`
-    - ObtenciondeAllData.R
+    - ObtenciondeAllData.R `Procesa los archivos de las 2 bases de datos (gene_filtered_phenENS.txt & match_v2.tsv) y la une con la informacion de Ensembl (mart_export_v2.txt). Finalmente genera unarchivo con la información de las 2 bases de datos.`
-    - alineamiento.sh
+    - alineamiento.sh `Alinea las secuencias de los genes con el genoma de Homo sapiens`
-    - get_monogenic_disease_data_DISEASES.sh
+    - get_monogenic_disease_data_DISEASES.sh `Obtiene la informacion de enfermedades monogénicas de DISEASE DB (human_disease_textmining_full.tsv).`
-    - get_monogenic_disease_data_OMIM.sh
+    - get_monogenic_disease_data_OMIM.sh `Obtiene la informacion de enfermedades monogénicas de OMIM (genemap2.txt)`
-    - mapeo.R
+    - mapeo.R `Usa el alinemaiento (sequences_aligned_sort.bam) y la anotacion del genoma de humano (Homo_sapiens.GRCh38.100.gff3.gz) para obtener la anotacion de los genes de interes`
+> Important Notes  
+  
+The files "genemap2.txt" and "genemap2_reorder.txt" are ommited due to OMIM policy restrictions. 
 ## Results  
-![Biotipo de los genes que causan enfermedades Mendelianas.]( Graphs/Grafica2.png)
+![Biotipo de los genes que causan enfermedades Mendelianas.](Graphs/Grafica2.png)