The next Helmholtz Hacky Hour will take place on Wednesday, December 9, 2020 from 2PM to 3PM! Topic: Sustainable Programming! more...

Commit 96b07b73 authored by Florian Centler's avatar Florian Centler

Update README.md

parent 7dc9303b
......@@ -72,13 +72,13 @@ Exp=Define a specific experimental name (e.g. Exp=Experiment_ID)
### Step 1: Preprocessing and clean-up of raw data
1. Create some output folder for all clean-up processes
2. Unzip the renamed original row read files
3. FastQC(1) of uncleaned row read files (forward and reverse selected)  PreQualCheck
4. Clean-up and quality scan with Trimmomatic(2) (default is PE)  CleanUp
5. Generate interleaved paired end files (Velvet(3)  “shuffleSequences”)  Interleaved_SampleID.fastq (only PE)
6. Create other output files with different content  CombinedUE_SampleID.fastq (only SE), Total_SampleID.fastq (PE+SE)
2. Unzip the renamed original raw read files
3. Apply FastQC on raw read files (select forward and reverse) (PreQualCheck)
4. Clean-up and quality scan with Trimmomatic (default is PE) (CleanUp)
5. Generate interleaved paired end files (Velvet, “shuffleSequences”) leading to Interleaved_SampleID.fastq (only PE)
6. Create further output files with different content, resulting in CombinedUE_SampleID.fastq (only SE), Total_SampleID.fastq (PE+SE)
7. Calculate the read length of the total cleaned file (Total_SampleID.fastq)
8. FastQC of cleaned reads of all possible cleaned files  PostQualCheck
8. Apply FastQC on cleaned reads of all cleaned files (PostQualCheck)
### Step 2: Pre-Taxonomical-Characterization
This step is integrated within the `main.sh` and provides a first insight into the composition of the microbial community using Metaxa2. It is a useful step to select the reference database for Bowtie2, to select the cleaned reads based on expected organism (the FASTA files of possible species must be downloaded manually).
......@@ -106,10 +106,9 @@ Diamond and MEGAN6 are used here to annotate the taxonomy and potential function
## Running CMP
1. Perform Step 1 as described above.
2. Start the Main.sh on your command line
3. Firstly decide between manual and automatically mode
Manual mode: Gives the possibility to check the completeness of the question 4)-5)  it breaks per question and continued after the input of “y ↳” (n resulted in a script break down)
Automatically mode: You have prepared all setting, folder, reference and original file like 4)-5) before and therefor you could start the full analyzing automatically without breaks.
2. Start the `main.sh` on your command line
3. Firstly decide between manual and automatic mode: **Manual mode:** Gives the possibility to check the completeness of the question 4)-5); it breaks per question and continued after the input of “y ↳” (n resulted in a script break down)
**Automatic mode:** You have prepared all setting, folder, reference and original file like 4)-5) before and therefor you could start the full analyzing automatically without breaks.
4. Store all original raw read files (based on Illumina sequencing) inside $path/OrgFiles (should be generated before to avoid errors during the pipeline run) and rename the files like the pattern under meinarrayF and meinarrayR
 manual mode: During the run you are asked!
5. In best case you know same reference genomes (or one important genome) for the first data-reduction step. It is the case please download all possible genomes from NCBI-databases and save the files as FASTA inside the folder $path/Reference/BaseFasta (folder and subfolder should be generated before).
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment