Project name: MOSS_Project_example-BDGP6
Samples
(excluding reference genome): 38
Report generated:
02/02/2026 02:24 PM
Species: Drosophila_melanogaster
Genome Assembly: BDGP6
Percent of reads aligned to reference is shown. These metrics are calculated by Picard collectalignmentsummarymetrics
Coverage is shown. These metrics are calculated by Picard CollectWgsMetrics
Standard Deviation of Coverage is shown. These metrics are calculated by Picard CollectWgsMetrics
Consesquences of variants as determined by bcftools consequence caller. Double click on a category in the legend to isolate and magnify it. Double click the legend again to restore all categories.
Integration elements with fewer than 4 detected reads are shown as not detected in the following table.
Principal Coordinates Analysis (PCoA) plot is shown. Genetic distance is calculated using plink –bfile PROJECT –distance 1-ibs –out PROJECT –allow-extra-chr
The output folder generated by this analysis pipeline contains the following folders and files:
integration_elements: see Methods for file descriptions
<integration element>
<sample>
bambedfiltered.bedmatched_readsquality_control
picardalignmentplot.dat: alignment rate for each
samplepicardcoverageplot.dat: mean coverage for each
samplestrain_identification
consequencesdendrogramMOSS_Project_example-BDGP6_csq.txt: consequences of
variants, limited to one consequence per gene per sampleIBS_distance_matrix.csv: distance matrix based on
identity by statepcoabcf: project-specific multi-sample BCF filevcfs: sample-wise VCF filesgvcfs: sample-wise gVCF filesReads were aligned to the reference genome using Ultima aligner (ua) version 2.2.1. Variants were called using Ultima make_examples version 3.1.10 (make_examples_3.1.10.sif) and Ultima call_variants version 2.2.4 (call_variants_2.2.4.sif).
Quality of whole genome sequencing and alignment data was assessed with Picard v. 2.25.6 using CollectWgsMetrics and CollectMultipleMetrics tools, respectively.
Integration elements were identified by matching sequences to reads using seqkit v. 2.9.0, aligning reads to the reference genome with bowtie2 v. 2.5.4. Output .sam files were converted to .bam files using samtools v. 1.20, which were converted to .bed files using bedtools v. 2.31.1. Filtered .bed files were created by filtering .bed files to exclude sites with MAPQ < 20.
Integration elements commands:
# find matching reads
seqkit grep -s -P -f {insertion}.txt {sample}.fastq > {sample}_{insertion}_matched_reads.txt
# align matching reads to reference genome
bowtie2 --very-sensitive-local -x {REFERENCE_INDEX} -U {sample}_{insertion}_matched_reads.txt -S {sample}_{insertion}_aligned_reads.sam
# convert sam file to bam file
samtools view -bS {sample}_{insertion}_aligned_reads.sam > {sample}_{insertion}_aligned_reads.bam
# convert bam file to bed file
bedtools bamtobed -i {sample}_{insertion}_aligned_reads.bam > {sample}_{insertion}_insertion_sites.bed
# filter bed file
awk -v threshold=20 '$5 >= threshold' {sample}_{insertion}_insertion_sites.bed > {sample}_{insertion}_insertion_sites_filtered.bed
Consequences of variants were annotated using bcftools version
1.6 consequence caller in haplotype aware mode. Results were
deduplicated to give one consequence per gene rather than one
consequence per transcript. The complete command used is: bcftools csq
-f <input.fasta> -g <input.gff3>