Project name: NA12878_Project_000-GRCh39
Run name:
400000-20250101_0000
Species: Homo_sapiens
Reference
genome assembly: unknown
Samples: 1
Mean reads per
sample: 415,949,227
Total reads: 415,949,227
Report Generated: Sat Apr 11 09:24:37 CDT 2026
Download
data: download.tsv
Reads per sample: number of reads generated
Alignment rate (%): number of reads generated that aligned to the reference genome
Duplication rate (%): percent of reads that are PCR duplicates as determined by the UG100 software (Picard).
Mean base quality: the mean Phred score of bases across all reads. Generally, a Phred quality score of 30 or higher is considered good and corresponds to an error rate of 1/1000 bases.
%bases>Q20: the percent of bases that have a quality score of 20 or more
Mean read length: the average length of reads in the sample
Mean aligned read length: the average length of the part of the read that aligns to the reference genome
Median coverage: the median depth of coverage across all parts of the reference genome. In general, coverage of 30x is considered good for whole genome sequencing.
%>=1x: the percent of the reference genome that is covered at 1x sequencing depth
%>=5x
%>=10x: the percent of the reference genome that is covered at 10x sequencing depth
%>=15x
%>=20x: the percent of the reference genome that is covered at 20x sequencing depth
%>=25x
%>=30x
%>=40x
%>=50x: the percent of the reference genome that is covered at 50x sequencing depth
%>=60x
%>=70x
%>=80x
%>=90x
%>=100x: the percent of the reference genome that is covered at 100x sequencing depth
Number of single nucleotide polymorphisms. SNPs not in dbSNP are potentially novel SNPs.
Number of insertions or deletions. Homopolymer indels are insertions or deletions associated with a homopolymer and are less reliable than non-homopolymer indels. Non-homopolymer indels are more likely to be a real finding compared to homopolymer indels.
Libraries were sequenced on an Ultima UG100 sequencing instrument. Demultiplexing of reads, trimming adapter sequences, and alignment to the reference genome was performed on the UG100. Variant calling of the CRAM alignment files was completed using the Ultima Genomics-adapted DeepVariant variant calling software (make_examples version 3.1.3; call_variants version 2.2.2; https://hub.docker.com/u/ultimagenomics).
Docker container versions:
call_variants_2.2.2
make_examples_3.1.3
Example ‘make examples’ command
tool
–input
{params.cramlist}
–cram-index {params.crailist}
–bed
/mnt/intervalfolder/interval{wildcards.interval}.bed
–output
singlesamples/{wildcards.sample}/{wildcards.interval}
–reference
/mnt/ref/{params.referencefastaname}
–min-base-quality 5
–min-mapq 5
–progress
–cgp-min-count-snps 2
–cgp-min-count-hmer-indels 2
–cgp-min-count-non-hmer-indels 2
–cgp-min-fraction-snps 0.12
–cgp-min-fraction-hmer-indels 0.12
–cgp-min-fraction-non-hmer-indels 0.06
–cgp-min-mapping-quality 5
–max-reads-per-region 1500
–assembly-min-base-quality 0
–optimal-coverages 50 –cap-at-optimal-coverage
–add-ins-size-channel
–gvcf –p-error 0.005
Example ‘post processing’ command
ug_postproc
–infile
{params.gzinputfiles}
–ref /mnt/ref/{params.referencefastaname}
–outfile {output.vcf}
–consider_strand_bias
–flow_order
TGCA
–annotate
–bed_annotation_files
{params.bedannotationfiles}
–qual_filter 1
–filter
–filters_file filters.txt
–dbsnp {params.dbsnpvcf}
–gvcf_outfile {output.gvcf}
–nonvariant_site_tfrecord_path
{params.tfinputfiles}
call_variants.ini file contents:
[RT classification]
onnxFileName = ../../model.ckpt-890000.dyn_1500.onnx
useSerializedModel = 1
trtWorkspaceSizeMB = 2000
numInferTreadsPerGpu = 2
useGPUs = 1
gpuid = 0
[debug]
logFileFolder = .
[general]
tfrecord = 1
compressed = 1
outputInOneFile =
0
numUncomprThreads = 8
uncomprBufSizeGB = 1
outputFileName
= call_variants
numConversionThreads = 2
numExampleFiles =
47
exampleFile1 = 0001.tfrecord.gz
exampleFile2 =
0002.tfrecord.gz
Variant call file statistics were tabulated using bcftools version
1.6 on the vcf files subset to variants with quality scores of q20 or
higher using commands like the following:
bcftools stats -F
{params.uaindexdir}/{params.referencefastaname} {input.vcf} >
{output.bcftools_stats}
Counts of variant types (SNPs, Indels, etc.) were tabulated from the
vcf files subset to variants with quality scores of q20 or higher using
commands like the following:
DB_SNPS=11954 11510 11954 11998zcat
{input.vcf} | grep “VARIANT_TYPE=snp” | grep “DB” | wc -l)
The data included in the outputs for every sample are:
- an
aligned cram file and cram.crai file (index)
- a variant call format
(vcf) file
- a variant call format (vcf) file subset to variants
with quality scores of q20 or higher
- a genomic variant call format
(gvcf) file