Project Name: Test
Report generated: June 28,
2024
Number of Samples: 21
This report summarizes results of analyzing metagenomics sequence data.
The Reads Per Sample plot shows the total number of reads in
each sample. The Read assignment to taxonomic groups
(classification) plot summarizes classification of reads into
different taxonomic groups. The taxonomic group unclassified
contains reads that could not be uniquely assigned to a taxonomic group.
Reads in the unclassified group are excluded from subsequent
analysis. The Top N most abundant taxa plot shows the top N most
abundant taxonomic groups for each sample. N is an integer specified
during analysis. By default, top 10 most abundant taxonomic groups are
shown. The Alpha Diversity (Shannon Index) plot show diversity of
each sample. Shannon index is used to generate the plot. The remaining
two plots are a dendrogram and PCA plots generated using the beta
diversity matrix. The dendrogram and PCA plot show the relationship
between different samples.
To make it easier for experienced users to perform their own analysis,
we provide abundance estimates ( abundance_read_count.biom ) as
determined by Bracken
3
in BIOM format. It contains
number of reads assigned to different species for each sample. This
count table can be imported and additional analysis performed using
popular metagenomics analysis tools such as
QIIME 2 and
mothur. Below are commands to load the
data into QIIME and mothur. Note, these commands are current as of
June 2024 and might have changed. Be sure to check documentation
for the most current commands.
QIIME 2
# Load count data into QIIME 2
> qiime tools import --input-path abundance_read_count.biom --type 'FeatureTable[Frequency]' --input-format BIOMV100Format --output-path qiime
mothur
# Load count data into mothur
> make.shared(biom=abundance_read_count.biom)
We also provide a tab delimited text file (raw_read_count.txt) with raw read counts as determined by Kraken 2 2. Users that would like to use a different approach to estimate abundance can use this raw reads counts file as input.
Analysis followed the protocol in Lu et al 1 with UMGC in-house scripts used to generate this report and provide additional quality metrics. The main steps of the analysis are summarized below.
Step 1: Read Count
Read count for each sample was performed UMGC in-house scripts.
Step 2: Read Classification
Kraken 2 2 was used to assign taxonomic labels to Metagenomic DNA sequences (reads) using the nt Database as reference.
Step 3: Abundance Estimation
Bracken 3 was used to estimate abundance. By default, estimation is done at the species level.
Step 4: Generate Data & Report
Python analysis scripts packages with Bracken 3 were used to compute alpha and beta diversity metrics. The other accompanying data were generated using QIIME 2.
The data folder accompanying this report contains the following files:
final_stats.txt: Tab delimited text file containing data used to generate plots in this report including additional alpha diversity metrics such berger-parker dominance metric.
raw_read_count.txt: Tab delimited text file of reads
assigned to each taxon by
Kraken 2
2.
abundance_read_count.biom: BIOM table containing abundance
estimates (reads counts) as determined by
Bracken
3. Note, the total number of reads in
abundance_read_count.biom will be different from the total number
of reads in your FastQ files. Abundance estimates
(abundance_read_count.biom) do not contain unclassified
reads.
abundance_read_count.txt: Same data in
abundance_read_count.biom but in a format that is easier to parse. A tab
delimited text file with rows containing taxa and columns containing
samples.
bracken_beta_diversity_bray_curtis_dissimilarity.txt: Text
file containing Bray-Curtis dissimilarity matrix computed using
Bracken
3.
qiime_beta_diversity_bray_curtis_dissimilarity.txt: Text
file containing Bray-Curtis dissimilarity matrix computed using
QIIME 2.
Qiime2 artifact files (.qza) contain the data and metadata from an analysis done using QIIME 2. These files can be analyzed further in Qiime2, or you can view the contents of the files using the online browser at https://view.qiime2.org.
abundance_read_count.qza: Abundance estimates (reads
counts). Same data in abundance_read_count.biom.
qiime_braycurtis.qza: Bray-Curtis dissimilarity
matrix.
If there is a large difference in sequencing depths between samples, the deeply sequenced sample might appear to have greater diversity relative to the sample with low sequencing depth. Rarefaction is the process of subsampling reads to the same depth to control for differences in sequencing depth. Alpha rarefaction curves are typically used to determine the optimal sequencing depth to rarefy data. While common with 16S rRNA data, for several reasons discussed below, alpha rarefaction curves are unlikely to provide useful information with shotgun data.
A study by Weiss et
al 4 on normalization and microbial differential
abundance strategies showed rarefying is beneficial when analyzing
samples with large differences in sequencing depths (~10X). Furthermore,
another study by
Hillmann et al 5 showed a depth of approximately 0.5
million was sufficient for most application with a depth of 2 million
reads recommended for detection of rare species below a relative
abundance of approximately 0.0005. Very few shotgun studies have
differences greater than 10X in sequencing depth and have less than 2
million reads. Considering the above, while common with 16S rRNA data,
alpha rarefaction curves are not as useful for most shotgun data hence
we did not plot them.
To confirm alpha rarefaction curves are unlikely to provide additional useful information for shotgun data, we analyzed WGS data of the same samples analyzed in three different labs in the study by Forry et al 6 and plotted alpha rarefaction curves. The results are shown in the figure below.
As the plot above shows, saturation happens at approximately 20,000 reads confirming it is unlikely alpha rarefaction curves will provide additional useful information.
1. Lu, J., Rincon, N., Wood, D.E. et al. Metagenome analysis
using the Kraken software suite. Nat Protoc 17, 2815–2839 (2022). https://doi.org/10.1038/s41596-022-00738-y
2. Wood, D.E., Salzberg, S.L. Kraken: ultrafast metagenomic
sequence classification using exact alignments. Genome Biol 15, R46
(2014). https://doi.org/10.1186/gb-2014-15-3-r46
3.Lu J, Breitwieser FP, Thielen P, Salzberg SL. 2017. Bracken:
estimating species abundance in metagenomics data. PeerJ Computer
Science 3:e104 https://doi.org/10.7717/peerj-cs.104
4. Weiss, S., Xu, Z.Z., Peddada, S. et al. Normalization and
microbial differential abundance strategies depend upon data
characteristics. Microbiome 5, 27 (2017). https://doi.org/10.1186/s40168-017-0237-y
5. Hillmann B, Al-Ghalith GA, Shields-Cutler RR, Zhu Q, Gohl DM,
Beckman KB, Knight R, Knights D 2018. Evaluating the Information Content
of Shallow Shotgun Metagenomics. mSystems 3:10.1128/msystems.00069-18.
https://doi.org/10.1128/msystems.00069-18
6. Forry, S.P., Servetas, S.L., Kralj, J.G. et al. Variability
and bias in microbiome metagenomic sequencing: an interlaboratory study
comparing experimental protocols. S ci Rep 14, 9785 (2024). https://doi.org/10.1038/s41598-024-57981-4