- Eggnog-mapper - to download EggNOG v5 database for functional annotation
- BLAST+ - to download and create BLAST NT database for genome content identification and filtering
- Mamba - to install all the tool included in Pannopi
- Snakemake - to run Pannopi
Pannopi is available in conda, to install and set is use following commands:
- Download Pannopi in separate conda envieronment:
conda create -n pannopi -c conda-forge -c bioconda -c aglab pannopi
- Activate the envieronment:
conda activate pannopi
- Eggnog-mapper databae (~50GB) is required to run Pannopi. BLAST NT (~110 GB) is needed for genome
filtration, but not necessary. You can download each of it together or separately or set your own ones,
if you have it already. Use
pannopi_download_db
tool to set or download databases. Examples:# Download all databases pannopi_download_db -m all -o /path/to/database/directory # Download eggnog-mapper/blast database pannopi_download_db -m eggnog/blast -o /path/to/database/directory # Set yours eggnog or blast database pannopi_download_db -m set -e /path/to/eggnog/database -b /path/to/blast/database/nt
- To run Pannopi on your reads use one of the following commands:
# If you have only short reads pannopi -m short -1 /path/to/forward_read_1.fastq -2 /path/to/reverse_read_2.fastq -r /path/to/reference.fasta -t 32 -o /path/to/outdir # If you have only long reads pannopi -m long -l /path/to/long_read.fastq -t 32 -o /path/to/outdir # If you have both short and long reads pannopi -m hybrid -1 /path/to/forward_read_1.fastq -2 /path/to/reverse_read_2.fastq -l /path/to/long_read.fastq -t 32 -o /path/to/outdir # If you have only assembly and wanna evaluate QC and annotate your genome pannopi -m anno -a /path/to/assembly.fasta -r /path/to/reference -t 32 -o /path/to/outdir
All modes except anno are used Unicycler tool for genome assembly, modes are:
- Short - mode for analysis of short reads. Starts with preparation of reads with v2trim and rmdub. Data preparation QC with FastQC, Jellyfish and GenomeScope v2.
- Long - mode for analysis of short reads.
- Hybrid - mode for analysis of both short- and long-reads with hybrid assembly. Reads QC as in Short mode.
- Anno - mode to evaluate quality of your assembly and to annotate it.
- Filtration of technical sequences contamination with Contera. If BLAST NT database is provided Contera will report about genome taxonomic content.
- Statistical QC of assembly with QUAST.
- ANI analysis with FastANI. Works only if reference (or close) genome are provided with
-r
argument.
- Structural annotation with Prokka
- Functional annotation with eggnog-mapper
- Antibiotic resistance, virulence, plasmid and serotype (only for E. Coli) genes with Abricate
- MLST genes with MLST
- CRISPRs with CRISPRCasFinder (in progress...)
- Genome visualisation with pyCircos (in progress...)
-m (–mode). Mode to run the pipeline [short, long or hybrid]. Single
for paired-end Illumina reads, long for long-reads, hybrid for hybrid
assembly with both short and long reads. [Required]
-1 (–forward). Path to forward read FASTQ file. [Required for short and hybrid modes]
-2 (–reverse). Path to reverse read FASTQ file. [Required for short and hybrid modes]
-l (–long-read). Path to reverse read FASTQ file. [Required for long and hybrid modes]
-r (–reference). Path to reference genome in FASTA format.
-t (–threads). Number of threads to use [Default is 4].
-o (–outdir). Path to output directory to store results. [Required]
parameter.
-d (–debug). Debug mode to check pipeline workflow.
-h (–help). Help message with arguments description.
- Köster, J., & Rahmann, S. (2012). Snakemake—a scalable bioinformatics workflow engine. Bioinformatics, 28(19), 2520-2522. [https://doi.org/10.1093/bioinformatics/bts480]
- Wick, R. R., Judd, L. M., Gorrie, C. L., & Holt, K. E. (2017). Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS computational biology, 13(6), e1005595. [https://doi.org/10.1371/journal.pcbi.1005595]
- Gurevich, A., Saveliev, V., Vyahhi, N., & Tesler, G. (2013). QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29(8), 1072-1075. [https://doi.org/10.1093/bioinformatics/btt086]
- Seemann, T. (2014). Prokka: rapid prokaryotic genome annotation. Bioinformatics, 30(14), 2068-2069. [https://doi.org/10.1093/bioinformatics/btu153]
- Huerta-Cepas, J., Forslund, K., Coelho, L. P., Szklarczyk, D., Jensen, L. J., Von Mering, C., & Bork, P. (2017). Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Molecular biology and evolution, 34(8), 2115-2122. [https://doi.org/10.1093/molbev/msx148]
- Huerta-Cepas, J., Szklarczyk, D., Heller, D., Hernández-Plaza, A., Forslund, S. K., Cook, H., ... & Bork, P. (2019). eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic acids research, 47(D1), D309-D314. [https://doi.org/10.1093/nar/gky1085]
- Seppey, M., Manni, M., & Zdobnov, E. M. (2019). BUSCO: assessing genome assembly and annotation completeness. In Gene prediction (pp. 227-245). Humana, New York, NY. [https://doi.org/10.1007/978-1-4939-9173-0_14]
- Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T., & Aluru, S. (2018). High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature communications, 9(1), 1-8. [https://doi.org/10.1038/s41467-018-07641-9]
- Ranallo-Benavidez, T. R., Jaron, K. S., & Schatz, M. C. (2020). GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nature Communications, 11(1), 1-10. [https://doi.org/10.1038/s41467-020-14998-3]