Main statistics¶
value | |
---|---|
metric | |
MIXED_read_mean_coverage | 5.01 |
PCT_MIXED_both_tags | 32.41 |
PCT_MIXED_both_tags_where_endreached | 32.71 |
PCT_MINUS_both_tags_where_endreached | 13.65 |
PCT_PLUS_both_tags_where_endreached | 29.38 |
PCT_UNDETERMINED_either_tag | 13.38 |
PCT_DISCORDANT | 10.88 |
PCT_read_end_unreached | 0.91 |
Mean_cvg | 15.45 |
Indel_Rate | 0.20 |
Mean_Read_Length | 76.34 |
PF_Barcode_reads | 658215149.00 |
PCT_PF_Reads_aligned | 98.74 |
PCT_Chimeras | 0.41 |
PCT_duplicates | 8.29 |
PCT_Failed_QC_reads | 0.00 |
PCT_failed_adapter_dimers | 0.14 |
PCT_failed_unrecognized_start_stem | 0.18 |
PCT_failed_unrecognized_start_loop | 0.07 |
- MIXED_read_mean_coverage is the coverage of reads where the tag in both the start and the end of the read were detected as MIXED
- pct_MIXED_both_tags is the ratio of reads where both loops were detected as MIXED out of all the reads
- pct_MIXED_both_tags_where_endreached is the ratio of reads where both loops were detected as MIXED out of the reads where the read end was reached so that the end loop could be measured
QC plots¶
This barplot shows the ratio of each category type in the data according to the spec in the top of the file. The categories are reported separately for the start- and end-loops. The end loop breakdown is shown only for the reads that reached the end loop.
These plots show the concordance between the strand ratio categories of the start-loop and end-loop. Each loop is assigned a category separately, and the concordance is plotted. The top plot includes all the reads, including those with END_UNREACHED, while the bottom includes reads where the end was reached only.
This plot shows the homopolymers called in the A, T, G and C hmers in the start loop (left) and in the T, G, C, A hmers in the end loop (right). The loops are expected to yield: - A signal of [1 1 1 1], AGCT and GCAT for the start and end loops, for MIXED reads - A signal of [0 2 0 2], TTCC and CCTT for the start and end loops, for MINUS-only reads - A signal of [2 0 2 0], AAGG and GGAA for the start and end loops, for PLUS-only reads
About ppmSeq¶
Identifying single nucleotide variants (SNVs) is fundamental to genomics. While consensus mutation calling, requiring multiple variant-containing reads to call genetic variation, is often used, it is unsuitable in calling rare SNVs, such as in circulating tumor DNA or somatic mosaicism, where often only a single supporting read is available. Paired Plus and Minus strand Sequencing (ppmSeq), a PCR-free library preparation technology that uniquely leverages the Ultima Genomics clonal amplification process, overcomes this challenge. Here, DNA denaturation is not required prior to clonal amplification so both native strands are clonally amplified on many sequencing beads, allowing for a linear increase in duplex recovery and scalable duplex coverage without requiring unique molecular identifiers or redundant sequencing.
In ppmSeq, modified Ultima Genomics adapters containing mismatched homopolymers are used to detect reads that are the result of the mixture of the two native DNA strands. While some reads are amplicons of only the Plus or Minus strands and are generally of typical UG read SNV accuracy, the so-called Mixed reads exhibit much lower error rates, well below 1E-6, facilitating the accurate detection of rare SNVs. Artifactual mutations manifesting on one strand only are common sources of error in SNV detection from NGS. While beads that are amplicons of Plus or Minus strand only are exposed to these artifacts that would appear as high-quality reads, in Mixed beads they create an inconsistent signal that translates into a low quality base or read, preventing them from being read as false positive SNVs.
This report is generated from preprocessing of the ppmSeq sequencing data, and is intended to be used as a QC report for the library prep and sequencing run. The distribution of the MINUS/PLUS ratio, assignment of reads to categories (MIXED/MINUS/PLUS/UNDETERMINED), and with the raw calls are shown.
ppmSeq adapter version¶
The ppmSeq_v1 adapter is used in this sample. It is composed of an AAGG-AAGG loop in the start and a GGAA-GGAA loop in " "the end of the read, so that reads are expected to ideally yield in each loop: - TTCC and CCTT for MINUS-only reads - AAGG and GGAA for PLUS-only reads - AGCT and GCAT for 50% MINUS - 50% PLUS reads Up to 2 homopolymer errors are allowed, as long as the distance from the second best fit is at least 4. Additionally, since the end loop is at the end of the reads it is not necessarily reached, in which case the loop is " "annotated as END_UNREACHED.
Detailed statistics¶
Statistics table: keys_to_convert
0 stats_shortlist 1 sorter_stats 2 strand_ratio_category_counts 3 strand_ratio_category_norm 4 strand_ratio_category_concordance 5 strand_ratio_category_consensus 6 trimmer_failure_codes dtype: object
Statistics table: sorter_stats
metric Mean_cvg 1.545000e+01 Indel_Rate 2.000000e-01 Mean_Read_Length 7.634000e+01 PF_Barcode_reads 6.582151e+08 PCT_PF_Reads_aligned 9.874000e+01 PCT_Chimeras 4.100000e-01 PCT_duplicates 8.290000e+00 PCT_Failed_QC_reads 0.000000e+00 Name: value, dtype: float64
Statistics table: stats_shortlist
metric MIXED_read_mean_coverage 5.007969e+00 PCT_MIXED_both_tags 3.241404e+01 PCT_MIXED_both_tags_where_endreached 3.271226e+01 PCT_MINUS_both_tags_where_endreached 1.365068e+01 PCT_PLUS_both_tags_where_endreached 2.938198e+01 PCT_UNDETERMINED_either_tag 1.337989e+01 PCT_DISCORDANT 1.087520e+01 PCT_read_end_unreached 9.116373e-01 Mean_cvg 1.545000e+01 Indel_Rate 2.000000e-01 Mean_Read_Length 7.634000e+01 PF_Barcode_reads 6.582151e+08 PCT_PF_Reads_aligned 9.874000e+01 PCT_Chimeras 4.100000e-01 PCT_duplicates 8.290000e+00 PCT_Failed_QC_reads 0.000000e+00 PCT_failed_adapter_dimers 1.359672e-01 PCT_failed_unrecognized_start_stem 1.774658e-01 PCT_failed_unrecognized_start_loop 6.728910e-02 Name: value, dtype: float64
Statistics table: strand_ratio_category_concordance
strand_ratio_category_start strand_ratio_category_end MIXED MIXED 0.324140 MINUS 0.004714 PLUS 0.054290 END_UNREACHED 0.006204 UNDETERMINED 0.062611 MINUS MIXED 0.010230 MINUS 0.135262 PLUS 0.026237 END_UNREACHED 0.001169 UNDETERMINED 0.026921 PLUS MIXED 0.007893 MINUS 0.004397 PLUS 0.291141 END_UNREACHED 0.001279 UNDETERMINED 0.010217 UNDETERMINED MIXED 0.015450 MINUS 0.000791 PLUS 0.009439 END_UNREACHED 0.000464 UNDETERMINED 0.007149 Name: count_norm, dtype: float64
Statistics table: strand_ratio_category_consensus
strand_ratio_category_consensus MIXED 0.327123 MINUS 0.136507 PLUS 0.293820 UNDETERMINED 0.133799 DISCORDANT 0.108752 Name: count_norm, dtype: float64
Statistics table: strand_ratio_category_counts
strand_ratio_category_start | strand_ratio_category_end | strand_ratio_category_end_no_unreached | |
---|---|---|---|
MIXED | 293891320 | 232607051 | 232607051 |
MINUS | 129935084 | 94394295 | 94394295 |
PLUS | 204785073 | 247819205 | 247819205 |
END_UNREACHED | 0 | 5928018 | 0 |
UNDETERMINED | 21649100 | 69512008 | 69512008 |
Statistics table: strand_ratio_category_norm
strand_ratio_category_start | strand_ratio_category_end | strand_ratio_category_end_no_unreached | |
---|---|---|---|
MIXED | 0.451959 | 0.357714 | 0.361005 |
MINUS | 0.199820 | 0.145164 | 0.146499 |
PLUS | 0.314928 | 0.381108 | 0.384614 |
END_UNREACHED | 0.000000 | 0.009116 | 0.000000 |
UNDETERMINED | 0.033293 | 0.106899 | 0.107882 |
Statistics table: trimmer_failure_codes
failed_read_count | total_read_count | PCT_failure | ||
---|---|---|---|---|
segment | reason | |||
First_C | no match | 120 | 873345008 | 0.000014 |
sequence was too short | 1252 | 873345008 | 0.000143 | |
Stem_start | no match | 1549889 | 873345008 | 0.177466 |
Unrecognized_End_loop | sequence was too long | 2877504 | 873345008 | 0.329481 |
Unrecognized_Start_loop | sequence was too long | 587666 | 873345008 | 0.067289 |
insert | sequence was too short | 1187463 | 873345008 | 0.135967 |
start | rsq file | 215129859 | 873345008 | 24.632861 |
sequence was too long | 1750678 | 873345008 | 0.200457 |
Statistics table: trimmer_histogram
strand_ratio_category_start | loop_sequence_start | strand_ratio_category_end | loop_sequence_end | native_adapter_length | count | count_norm | |
---|---|---|---|---|---|---|---|
0 | PLUS | AAGGA | PLUS | GGAAC | 1.0 | 154076692 | 0.236946 |
1 | MINUS | TTCCA | MINUS | CCTTC | 1.0 | 83668521 | 0.128669 |
2 | MIXED | TGCA | MIXED | GCATTC | 1.0 | 31514816 | 0.048465 |
3 | PLUS | AAGGA | PLUS | GGAC | 1.0 | 30588563 | 0.047040 |
4 | MIXED | ATGCA | MIXED | GCATTC | 1.0 | 29180882 | 0.044876 |
5 | MIXED | ATGCA | UNDETERMINED | NaN | 1.0 | 23441782 | 0.036050 |
6 | MIXED | ATGCA | MIXED | GCTC | 1.0 | 15754095 | 0.024227 |
7 | MIXED | ATGCA | MIXED | GGCATTC | 1.0 | 15556446 | 0.023923 |
8 | MINUS | TTCCA | UNDETERMINED | NaN | 1.0 | 15400600 | 0.023684 |
9 | MIXED | ATGCA | MIXED | GTC | 1.0 | 14576619 | 0.022417 |
10 | MIXED | TGCA | MIXED | GGCATTC | 1.0 | 13750335 | 0.021146 |
11 | MINUS | TTCCA | PLUS | GGAAC | 1.0 | 13502706 | 0.020765 |
12 | MIXED | TGCA | MIXED | GCTC | 1.0 | 13103811 | 0.020152 |
13 | MIXED | ATGCA | PLUS | GGAAC | 1.0 | 12806975 | 0.019695 |
14 | MIXED | ATGCA | MIXED | GGCTC | 1.0 | 10770725 | 0.016564 |
15 | MIXED | TGCA | UNDETERMINED | NaN | 1.0 | 10331640 | 0.015888 |
16 | MIXED | TGCA | PLUS | GGAAC | 1.0 | 9920185 | 0.015256 |
17 | MIXED | TGCA | MIXED | GTC | 1.0 | 8930061 | 0.013733 |
18 | MIXED | AGCA | MIXED | GCATTC | 1.0 | 8025845 | 0.012343 |
19 | MIXED | TGCA | MIXED | GGCTC | 1.0 | 6721537 | 0.010337 |
20 | PLUS | AAGGA | UNDETERMINED | NaN | 1.0 | 6404236 | 0.009849 |
21 | UNDETERMINED | NaN | MIXED | GCATTC | 1.0 | 6015340 | 0.009251 |
22 | MIXED | ATCA | MIXED | GCATTC | 1.0 | 5429406 | 0.008350 |
23 | MIXED | ACA | MIXED | GCATTC | 1.0 | 4754587 | 0.007312 |
24 | UNDETERMINED | NaN | UNDETERMINED | NaN | 1.0 | 4648468 | 0.007149 |
25 | UNDETERMINED | NaN | PLUS | GGAAC | 1.0 | 3242453 | 0.004986 |
26 | MIXED | ATGCA | PLUS | GGAC | 1.0 | 2807897 | 0.004318 |
27 | UNDETERMINED | NaN | PLUS | GGAC | 1.0 | 2747446 | 0.004225 |
28 | PLUS | AAGGA | MINUS | CCTTC | 1.0 | 2743214 | 0.004219 |
29 | MIXED | ATTGCA | MIXED | GCATTC | 1.0 | 2734818 | 0.004206 |
30 | MINUS | TTCCA | MIXED | GCATTC | 1.0 | 2577225 | 0.003963 |
31 | MINUS | TTCCA | PLUS | GGAC | 1.0 | 2519408 | 0.003874 |
32 | MIXED | ATGCA | MIXED | GGCATC | 1.0 | 2288898 | 0.003520 |
33 | MIXED | ATGCA | MIXED | GCATC | 1.0 | 2217096 | 0.003410 |
34 | MIXED | ATGCA | END_UNREACHED | NaN | NaN | 2199246 | 0.003382 |
35 | PLUS | AAGGA | MIXED | GCATTC | 1.0 | 2082861 | 0.003203 |
36 | MIXED | ATGA | UNDETERMINED | NaN | 1.0 | 1821550 | 0.002801 |
37 | MIXED | TGCA | MIXED | GCATC | 1.0 | 1790798 | 0.002754 |
38 | MIXED | TGCA | PLUS | GGAC | 1.0 | 1713303 | 0.002635 |
39 | MINUS | TTCCA | MINUS | CTTC | 1.0 | 1700989 | 0.002616 |
40 | MIXED | ATGCA | MIXED | GATTC | 1.0 | 1617865 | 0.002488 |
41 | MIXED | AGCA | PLUS | GGAAC | 1.0 | 1584756 | 0.002437 |
42 | MIXED | AATGCA | UNDETERMINED | NaN | 1.0 | 1494533 | 0.002298 |
43 | MIXED | ATGCA | MIXED | GATC | 1.0 | 1466413 | 0.002255 |
44 | MIXED | TGCA | MIXED | GATTC | 1.0 | 1424505 | 0.002191 |
45 | MINUS | ATTCCA | UNDETERMINED | NaN | 1.0 | 1406821 | 0.002163 |
46 | MIXED | ATGCA | MINUS | CCTTC | 1.0 | 1327435 | 0.002041 |
47 | MIXED | ATCA | PLUS | GGAAC | 1.0 | 1221513 | 0.001878 |
48 | PLUS | AAGGA | PLUS | GGAATC | 1.0 | 1209439 | 0.001860 |
49 | UNDETERMINED | NaN | MIXED | GCTC | 1.0 | 1175749 | 0.001808 |