The next Helmholtz Hacky Hour will take place on Wednesday, December 9, 2020 from 2PM to 3PM! Topic: Sustainable Programming! more...

Commit 15a0314d authored by Florian Centler's avatar Florian Centler

Update README.md

parent 41ef3421
......@@ -160,26 +160,32 @@ Inside the `$path/` folder, the following sub-folders will be created:
* `$path/Annotation` contains all files which are the base input for external annotation with Diamond and MEGAN6, per sample (grouped mapped/unmapped).
### External annotation with Diamond and MEGAN6
To finalize our analyzing process, the annotation steps must be performed as external analyzing. The reason is that DIAMOND needs a lot of power to generate an NCBI-database based mapping file per sample. Therefore we use the EVE-Cluster.
To finalize our analysis process, the annotation is performed externally. This allows for switching to a more powerfull machine as Diamond needs a lot of computational power to generate an NCBI-database based mapping file per sample.
The first step for annotation is to generate an nr.dmnd file based on the nr.gz (downloaded from NCBI) with Diamond. This step must be performed only at first time, after you should use for every run the same nr file:
The first step for annotation is to generate an `nr.dmnd` file based on `nr.gz`, downloaded from NCBI, with Diamond. This step must be performed only at the first time:
```
wget ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz
diamond makedb –in /path/to/nr.gz –db /path/to/store/nr
```
After generation the nr.dmnd file the blastx run of diamond can started. Therefor use the diamond_daa_maske.sub inside the subScript folder of MCB-MG-Pipeline. Inside the maske.sub transform the /data/…/temp/ path like your desired path to a temp folder (generate a temp folder under your desired path). You can use this path for every run and therefor it must be transformed only on first time. After optimize the sub-file, use the following command on EVE-Cluster. NOTE: IDx must be renamed with your individual sample-IDs (the input is found under $path/Annotation/IDx/mapped_annotation/input and $path/Annotation/IDx/unmapped_annotation/input):
qsub –l highmem diamond_daa_maske.sub blastx –q /path-to-input/all_mapped_IDx_total.fasta –d /path-to/nr –a /path-to-outputfolder/all_mapped_IDx_total.daa
qsub –l highmem diamond_daa_maske.sub blastx –q /path-to-input/unm_IDx_RAContig_total.fasta –d /path-to/nr –a /path-to-outputfolder/unm_IDx_RAContig_total.daa
After generation of the `nr.dmnd` file, the blastx run of Diamond can started:
Now the *.daa file must be transformed in a *.rma file, to use MEGAN6. There are also a diamond_daa2rma_maske.sub under the subScript folder of MCB-MG-Pipeline. Also the path must be transformed like the path of the first sub file.
qsub –l highmem diamond_daa2rma_maske.sub –i /path-to-outputfolder/unm_IDx_RAContig_total.daa –o /path-to-outputfolder/unm_IDx_RAContig_total.rma –a2t /data/umbsysbio/Dani/DB/prot_acc2tax-May2017.abin -a2eggnog /data/umbsysbio/Dani/DB/acc2eggnog-Oct2016X.abin -fun EGGNOG
```
diamond blastx –q /path-to-input/all_mapped_IDx_total.fasta –d /path-to/nr –a /path-to-outputfolder/all_mapped_IDx_total.daa
diamond blastx –q /path-to-input/unm_IDx_RAContig_total.fasta –d /path-to/nr –a /path-to-outputfolder/unm_IDx_RAContig_total.daa
```
NOTE: IDx must be renamed with your individual sample-IDs (the input is found under `$path/Annotation/IDx/mapped_annotation/input` and `$path/Annotation/IDx/unmapped_annotation/input`.
Now the `*.daa` files must be transformed into `*.rma` files using MEGAN6:
qsub –l highmem diamond_daa2rma_maske.sub –i /path-to-outputfolder/all_mapped_IDx_total.daa –lg –o /path-to-outputfolder/all_mapped_IDx_total.rma –a2t /data/umbsysbio/Dani/DB/prot_acc2tax-May2017.abin -a2eggnog /data/umbsysbio/Dani/DB/acc2eggnog-Oct2016X.abin -fun EGGNOG
-lg means long reads, because the unmapped reads are contigs
-pof could be also an addition setting for paired end in one file
-fwa (first word is accession)
```
daa2rma –i /path-to-outputfolder/unm_IDx_RAContig_total.daa –o /path-to-outputfolder/unm_IDx_RAContig_total.rma –a2t /data/umbsysbio/Dani/DB/prot_acc2tax-May2017.abin -a2eggnog /data/umbsysbio/Dani/DB/acc2eggnog-Oct2016X.abin -fun EGGNOG
daa2rma –i /path-to-outputfolder/all_mapped_IDx_total.daa –lg –o /path-to-outputfolder/all_mapped_IDx_total.rma –a2t /data/umbsysbio/Dani/DB/prot_acc2tax-May2017.abin -a2eggnog /data/umbsysbio/Dani/DB/acc2eggnog-Oct2016X.abin -fun EGGNOG
```
`-lg` means long reads, because the unmapped reads are contigs, `-pof` could be also an additional setting for paired end reads in one file; `-fwa` indicates first word is accession
After all preparation steps the *.rma file can uploaded on MEGAN6 with Open . Look at the MEGAN-Manual to find out, which possibilities of analyzing are given.
Finally, `*.rma` files can be loaded in MEGAN6 ("Open"). Look at the MEGAN-Manual to find out, which possibilities for analysis are availbale.
### Analyzing the whole community
Depending on the relation between mapped and unmapped reads, the mapped reads are low. Because we only show on the first alignment of reads. To reconstruct the whole community in optimal relation, please calculate like following:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment