UG100 Whole Genome Sequencing Report

Project name: NA12878_Project_000-GRCh39
Run name: 400000-20250101_0000
Species: Homo_sapiens
Reference genome assembly: unknown
Samples: 1
Mean reads per sample: 415,949,227
Total reads: 415,949,227
Report Generated: Sat Apr 11 09:24:37 CDT 2026
Download data: download.tsv

Summary Metrics

Reads per sample

Reads per sample: number of reads generated

Alignment rate (%)

Alignment rate (%): number of reads generated that aligned to the reference genome

Duplication rate (%)

Duplication rate (%): percent of reads that are PCR duplicates as determined by the UG100 software (Picard).

Read Quality

Mean base quality

Mean base quality: the mean Phred score of bases across all reads. Generally, a Phred quality score of 30 or higher is considered good and corresponds to an error rate of 1/1000 bases.

%bases>Q20

%bases>Q20: the percent of bases that have a quality score of 20 or more

Read Length

Mean read length

Mean read length: the average length of reads in the sample

Mean aligned read length

Mean aligned read length: the average length of the part of the read that aligns to the reference genome

Coverage

Median coverage

Median coverage: the median depth of coverage across all parts of the reference genome. In general, coverage of 30x is considered good for whole genome sequencing.

%>=1x

%>=1x: the percent of the reference genome that is covered at 1x sequencing depth

%>=5x

%>=10x

%>=10x: the percent of the reference genome that is covered at 10x sequencing depth

%>=15x

%>=20x

%>=20x: the percent of the reference genome that is covered at 20x sequencing depth

%>=25x

%>=30x

%>=40x

%>=50x

%>=50x: the percent of the reference genome that is covered at 50x sequencing depth

%>=60x

%>=70x

%>=80x

%>=90x

%>=100x

%>=100x: the percent of the reference genome that is covered at 100x sequencing depth

VCF Metrics

Number of SNPs

Number of single nucleotide polymorphisms. SNPs not in dbSNP are potentially novel SNPs.

Number of indels

Number of insertions or deletions. Homopolymer indels are insertions or deletions associated with a homopolymer and are less reliable than non-homopolymer indels. Non-homopolymer indels are more likely to be a real finding compared to homopolymer indels.

Methods

Libraries were sequenced on an Ultima UG100 sequencing instrument. Demultiplexing of reads, trimming adapter sequences, and alignment to the reference genome was performed on the UG100. Variant calling of the CRAM alignment files was completed using the Ultima Genomics-adapted DeepVariant variant calling software (make_examples version 3.1.3; call_variants version 2.2.2; https://hub.docker.com/u/ultimagenomics).

Click for detailed methods

Docker container versions:
call_variants_2.2.2
make_examples_3.1.3

Example ‘make examples’ command
tool
–input {params.cramlist}
–cram-index {params.crailist}
–bed /mnt/intervalfolder/interval{wildcards.interval}.bed
–output singlesamples/{wildcards.sample}/{wildcards.interval}
–reference /mnt/ref/{params.referencefastaname}
–min-base-quality 5
–min-mapq 5
–progress
–cgp-min-count-snps 2
–cgp-min-count-hmer-indels 2
–cgp-min-count-non-hmer-indels 2
–cgp-min-fraction-snps 0.12
–cgp-min-fraction-hmer-indels 0.12
–cgp-min-fraction-non-hmer-indels 0.06
–cgp-min-mapping-quality 5
–max-reads-per-region 1500
–assembly-min-base-quality 0
–optimal-coverages 50 –cap-at-optimal-coverage
–add-ins-size-channel
–gvcf –p-error 0.005

Example ‘post processing’ command
ug_postproc
–infile {params.gzinputfiles}
–ref /mnt/ref/{params.referencefastaname}
–outfile {output.vcf}
–consider_strand_bias
–flow_order TGCA
–annotate
–bed_annotation_files {params.bedannotationfiles}
–qual_filter 1
–filter
–filters_file filters.txt
–dbsnp {params.dbsnpvcf}
–gvcf_outfile {output.gvcf}
–nonvariant_site_tfrecord_path {params.tfinputfiles}

call_variants.ini file contents:
[RT classification]
onnxFileName = ../../model.ckpt-890000.dyn_1500.onnx
useSerializedModel = 1
trtWorkspaceSizeMB = 2000
numInferTreadsPerGpu = 2
useGPUs = 1
gpuid = 0

[debug]
logFileFolder = .

[general]
tfrecord = 1
compressed = 1
outputInOneFile = 0
numUncomprThreads = 8
uncomprBufSizeGB = 1
outputFileName = call_variants
numConversionThreads = 2
numExampleFiles = 47

exampleFile1 = 0001.tfrecord.gz
exampleFile2 = 0002.tfrecord.gz

Variant call file statistics were tabulated using bcftools version 1.6 on the vcf files subset to variants with quality scores of q20 or higher using commands like the following:
bcftools stats -F {params.uaindexdir}/{params.referencefastaname} {input.vcf} > {output.bcftools_stats}

Counts of variant types (SNPs, Indels, etc.) were tabulated from the vcf files subset to variants with quality scores of q20 or higher using commands like the following:
DB_SNPS=11954 11510 11954 11998zcat {input.vcf} | grep “VARIANT_TYPE=snp” | grep “DB” | wc -l)

Data

The data included in the outputs for every sample are:
- an aligned cram file and cram.crai file (index)
- a variant call format (vcf) file
- a variant call format (vcf) file subset to variants with quality scores of q20 or higher
- a genomic variant call format (gvcf) file