The next Helmholtz Hacky Hour will take place on Wednesday, December 9, 2020 from 2PM to 3PM! Topic: Sustainable Programming! more...

Commit 8e51bef1 authored by Florian Centler's avatar Florian Centler

Update README.md

parent 15a0314d
......@@ -23,7 +23,6 @@ CMP makes use of a number of software packages, which need to be available on th
* MetaPhlAn
* Bowtie2
* IDBA-UD
* ABySS
* MaxBin2
* CheckM
* bam2fastq
......@@ -187,44 +186,23 @@ daa2rma –i /path-to-outputfolder/all_mapped_IDx_total.daa –lg –o /path-to-
Finally, `*.rma` files can be loaded in MEGAN6 ("Open"). Look at the MEGAN-Manual to find out, which possibilities for analysis are availbale.
### Analyzing the whole community
Depending on the relation between mapped and unmapped reads, the mapped reads are low. Because we only show on the first alignment of reads. To reconstruct the whole community in optimal relation, please calculate like following:
1) Calculate the community per species (or other level) one the one hand for mapped and on the other hand for unmapped, based on MEGAN6 taxonomy. E.g:
mapped_output: unmapped_output:
Species1 sum 1232  93,05% Species1 sum 12032  92,90%
Species2 sum 90  6,80% Species2 sum 900  6,95%
SpeciesN sum 2  0,15% SpeciesN sum 20  0,15%
Sum: 1324 Sum: 12952
2) Now you must look at the bowtie2 logfile to check the “overall alignment rate” [OAR]. E.g
OAR: 31 %  unmapped 100%-31%=69%
For the species in mapped file, follow the next step:
(OAR * map_Species%) / 100 = e.g. (31* 93,05) / 100 = 28,85 %
For the species of unmapped file, follow:
((100-OAR) * unm_Species%) / 100 = (69 * 92,90) / 100 = 64,10
#########################################################################################################
#Additional analyzing with ABySS#
#########################################################################################################
If you like to use ABySS as assembler, it is more complicated, but there are executable files to use.
Under MCB-MG-Pipeline/bin/SingleScripts there are three additional scripts:
- check_builded_fastq_script: (that could be use if your FastQ file is invalid, maybe for mock datasets)
$pathSH/SingleScripts/check_builded_fastq_script < input.fastq > checked_output.fastq
- Pre-KmerCheck.sh: If you like to use ABySS with optimal k-mer values, then you must check it before. This script gives you all
output for k-mer values between 51 and 248 (step size 4). After all a stat_total_IDx under kcheck_unmap/unmap_IDx_kcheck
should be generated with all important values for the k-mer-comparison. In best case the N50 value should be high and the
maximal contig length should be a good compromise!
 Before starting the single script please redefine path, f and meinarrayNew in the header of the script!
- Only_ABYSS.sh: With this script you can run only ABySS based assembly, binning and reassembly also with ABySS. The output is
marked with the surname “abyss” under the Assembly-folder.
 Before starting the single script please re-define path, pathT, pathSH, f, meinarray*, Exp, kdef (based on PreCheck or after rule
of the thumb: average read length of unmapped reads per sample – 10 bp) in the header of the script!
- R-Script: Combine function and taxonomy of MEGAN6-output (isn’t ready)
### Combining taxonomic and functional information
To combine taxonomic with functional information for reads, the R script `AnnotationCombine.r` can be used:
1. Open the `*.rma` file in Megan6
2. In the taxonomic tree: uncollapse the tree and select the relevant subtrees (including all sub-nodes!)
3. Export as CSV, selecting the option "read-id-taxonomic-path-including-percentage" (export what=CSV format=readName_to_taxonPathPercent separator=tab counts=summarized)
4. In the functional tree, unncollapse tree and select all leafs only
5. Export as CSV, selecting the option "read-id-functional-path" (export what=CSV format=readName_to_eggnogPath separator=tab counts=summarized)
6. Make sure that in both cases, the number of lines in exported CSV files matches values reported in the Megan tree visualizations! (The script assumes that for each read, there is one unique annotation for taxonomy and/or function)
7. Adjust filenames in `AnnotationCombine.R`, and select appropriate levels to analyze (close to the end of the script)
8. Run `AnnotationCombine.R` (Rstudio recommended, where subsequently other levels can be analyzed as well)
## Author
* **Daniela Becker**, UFZ - Helmholtz Centre for Environmental Research, Leipzig, Germany
* **Daniela Becker** (all except AnnotationCombine.r), UFZ - Helmholtz Centre for Environmental Research, Leipzig, Germany
* **Florian Centler** (AnnotationCombine.r), UFZ - Helmholtz Centre for Environmental Research, Leipzig, Germany
Contact: daniela.taraba@ufz.de
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment