The next Helmholtz Hacky Hour will take place on Wednesday, December 9, 2020 from 2PM to 3PM! Topic: Sustainable Programming! more...

Commit 26516bb2 authored by Florian Centler's avatar Florian Centler

Update README.md

parent 7e3265e5
......@@ -85,7 +85,9 @@ CMP consists of three Bash scripts which are all called from the main script `ma
8. Apply FastQC on cleaned reads of all cleaned files (PostQualCheck)
### Step 2: Pre-Taxonomical-Characterization
This step is integrated within the `main.sh` and provides a first insight into the composition of the microbial community using Metaxa2. It is a useful step to select the reference database for Bowtie2, to select the cleaned reads based on expected organism (the FASTA files of possible species must be downloaded manually).
This step is integrated within `main.sh` and provides a first insight into the composition of the microbial community using Metaxa2.
It is a useful step to select the reference database of expected species if no other information on expected species is available.
Please refer to this [NCBI FAQ](https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/#downloadservice) for semi-automatic download of FASTA files for the selected expected species.
### Step 3: Read Mapping
1. It starts with Bowtie2: a) Build Bowtie2-Index depending on the desired reference database (under BaseFasta), b) Mapping with Bowtie2, creating a sam file
......@@ -114,7 +116,7 @@ Diamond and MEGAN6 are used here to annotate the taxonomy and potential function
3. Firstly decide between manual and automatic mode: **Manual mode:** Gives the possibility to verify that data is correctly set up (Points 4. and 5.); the user is guided through the process by a series of questions, with answering "no" to any question will exit the script.
**Automatic mode:** You have prepared all settings, folders, references, and original files (Points 4. und 5.) before and therefor you could start the full analyzing automatically without further user intervention.
4. Store all original raw read files (based on Illumina sequencing) inside `$path/OrgFiles` (should be generated before to avoid errors during the pipeline run) and rename the files as in the pattern under meinarrayF and meinarrayR.
5. In best case you have one or more reference genome(s) for the first data-reduction step. In this case please download the respective genomes from the NCBI-databases and save the files as FASTA files inside the folder `$path/Reference/BaseFasta` (folder and subfolder should be generated beforehand).
5. In best case you have one or more reference genome(s) for the first data-reduction step. In this case please download the respective genomes from the NCBI-databases (please see [NCBI FAQ](https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/#downloadservice) for semi-automatic download of reference genomes) and save the files as FASTA files inside the folder `$path/Reference/BaseFasta` (folder and subfolder should be generated beforehand).
If it is not the case start the `main.sh` in manual mode without pre-stored reference genomes and follow the manual mode instructions (Pre-Taxonomical Characterization)!
NOTE: Only take high abundance species (> 30 matches) as reference (maybe based on metaxa-output) or take information from other methods, like 16S rRNA analysis or mcrA analysis. As reference also you can use only one reference genomes!
6. Regardless of manual or automatic mode, at the end of analysis you will be asked if intermediate results should be deleted.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment