Project name: Example_Project_000
Run name: 400000-20250101_1234
Species: Homo sapiens
Reference genome assembly: gatkhg38
Samples: 24
Mean reads per sample: 513,353,738
Total reads: 12,320,489,724
Report Generated: Fri Oct 10 16:49:43 CDT 2025
Download data: download.tsv
Reads per sample: number of reads generated
Alignment rate (%): number of reads generated that aligned to the reference genome
Duplication rate (%): percent of reads that are PCR duplicates as determined by the UG100 software (Picard).
%bases>Q20: the percent of bases that have a quality score of 20 or more
Mean base quality: the mean Phred score of bases across all reads. Generally, a Phred quality score of 30 or higher is considered good and corresponds to an error rate of 1/1000 bases.
Mean read length: the average length of reads in the sample
Mean aligned read length: the average length of the part of the read that aligns to the reference genome
Median coverage: the median depth of coverage across all parts of the reference genome. In general, coverage of 30x is considered good for whole genome sequencing.
%>=1x: the percent of the reference genome that is covered at 1x sequencing depth
%>=10x: the percent of the reference genome that is covered at 10x sequencing depth
%>=20x: the percent of the reference genome that is covered at 20x sequencing depth
%>=50x: the percent of the reference genome that is covered at 50x sequencing depth
%>=100x: the percent of the reference genome that is covered at 100x sequencing depth
%>=500x: the percent of the reference genome that is covered at 500x sequencing depth
%>=1000x: the percent of the reference genome that is covered at 1000x sequencing depth
Number of single nucleotide polymorphisms. A typical number of SNPs in a human sample is around 4 to 5 million (A global reference for human genetic variation). SNPs not in dbSNP are potentially novel SNPs.
Number of insertions or deletions. A typical number of indels is around 600,000 (Human Genomic Variation). Homopolymer indels are insertions or deletions associated with a homopolymer and are less reliab le than non-homopolymer indels. Non-homopolymer indels are more likely to be a real finding compared to homopolymer indels.
Libraries were sequenced on an Ultima UG100 sequencing instrument. Demultiplexing of reads, trimming adapter sequences, and alignment to the reference genome was performed on the UG100. Variant calling of the CRAM alignment files was completed using the Ultima Genomics-adapted DeepVariant variant calling software (make_examples version 3.1.3; call_variants version 2.2.2; https://hub.docker.com/u/ultimagenomics).
Docker container versions:
call_variants_2.2.2
make_examples_3.1.3
Example ‘make examples’ command
tool
–input {params.cramlist}
–cram-index {params.crailist}
–bed /mnt/intervalfolder/interval{wildcards.interval}.bed
–output singlesamples/{wildcards.sample}/{wildcards.interval}
–reference /mnt/ref/{params.referencefastaname}
–min-base-quality 5
–min-mapq 5
–progress
–cgp-min-count-snps 2
–cgp-min-count-hmer-indels 2
–cgp-min-count-non-hmer-indels 2
–cgp-min-fraction-snps 0.12
–cgp-min-fraction-hmer-indels 0.12
–cgp-min-fraction-non-hmer-indels 0.06
–cgp-min-mapping-quality 5
–max-reads-per-region 1500
–assembly-min-base-quality 0
–optimal-coverages 50 –cap-at-optimal-coverage
–add-ins-size-channel
–gvcf –p-error 0.005
Example ‘post processing’ command
ug_postproc
–infile {params.gzinputfiles}
–ref /mnt/ref/{params.referencefastaname}
–outfile {output.vcf}
–consider_strand_bias
–flow_order TGCA
–annotate
–bed_annotation_files {params.bedannotationfiles}
–qual_filter 1
–filter
–filters_file filters.txt
–dbsnp {params.dbsnpvcf}
–gvcf_outfile {output.gvcf}
–nonvariant_site_tfrecord_path {params.tfinputfiles}
call_variants.ini file contents:
[RT classification]
onnxFileName = ../../model.ckpt-890000.dyn_1500.onnx
useSerializedModel = 1
trtWorkspaceSizeMB = 2000
numInferTreadsPerGpu = 2
useGPUs = 1
gpuid = 0
[debug]
logFileFolder = .
[general]
tfrecord = 1
compressed = 1
outputInOneFile = 0
numUncomprThreads = 8
uncomprBufSizeGB = 1
outputFileName = call_variants
numConversionThreads = 2
numExampleFiles = 47
exampleFile1 = 0001.tfrecord.gz
exampleFile2 = 0002.tfrecord.gz
Variant call file statistics were tabulated using bcftools version 1.6 using a command like the following:
bcftools stats -F {params.uaindexdir}/{params.referencefastaname} {input.vcf} > {output.bcftools_stats}
Counts of variant types (SNPs, Indels, etc.) were tabulated using commands like the following:
DB_SNPS=11998 10392 10394 10504 11928 11954 11998 12059 12468 13144zcat {input.vcf} | grep “VARIANT_TYPE=snp” | grep “DB” | wc -l)
The data included in the outputs for every sample are:
- an aligned cram file and cram.crai file (index)
- a variant call format (vcf) file
- a genomic variant call format (gvcf) file