skip to primary navigationskip to content
 

Variant calling

Description of the methods used to call and filter SNVs and Indels and Copy Number alterations

1.0+

SNVs and Indels are called using Strelka 1.0.13. (https://sites.google.com/site/strelkasomaticvariantcaller/). A series of additional filters are applied to Strelka output to remove more false positives (filters were developed based on ICGC benchmarking study). Filtered SNVs and indels are annotated using Ensembl Variant Effect Predictor software (http://www.ensembl.org/info/docs/tools/vep/index.html).

 

 SNVs are filtered based on the following criteria:

 

mapping quality for reads to be included in calculation of metrics >= 1

base quality for reads to be included in calculation of metrics >= 10

alternate allele count for a position to be considered in detecting SNV clusters >= 2

alternate allele frequency for a position to be considered in detecting SNV clusters >= 0.05

VariantAlleleCount < 4

VariantAlleleCountControl > 1

DistanceToAlignmentEndMedian < 10.0

DistanceToAlignmentEndMAD < 3.0

VariantMapQualMedian < 40.0

MapQualDiffMedian < -5.0 || MapQualDiffMedian > 5.0

LowMapQual > 0.1

VariantBaseQualMedian < 30.0

VariantStrandBias < 0.02 && StrandBias > 0.02

SNVCluster50 > 2

SNVCluster100 > 4

Repeat >= 12

 

Strelka parameters:

 

isSkipDepthFilters = 0

maxInputDepth = 10000

depthFilterMultiple = 3.0

snvMaxFilteredBasecallFrac = 0.4

snvMaxSpanningDeletionFrac = 0.75

indelMaxRefRepeat = 8

indelMaxWindowFilteredBasecallFrac = 0.3

indelMaxIntHpolLength = 14

ssnvPrior = 0.000001

sindelPrior = 0.000001

ssnvNoise = 0.0000005

sindelNoise = 0.000001

ssnvNoiseStrandBiasFrac = 0.5

minTier1Mapq = 20

minTier2Mapq = 5

ssnvQuality_LowerBound = 15

sindelQuality_LowerBound = 30

isWriteRealignedBam = 1

binSize = 25000000

 

 

 

Older versions - redundant

0.9<

Data QC, Trimming & Alignment

FASTQ files were processed using our QC and alignment pipelines as follows:

Data quality assessed using FastQC
Second read for each of the three 251bp paired end MiSeq runs (A3MCW, A4DBD and A4DC6) trimmed to 220 bases.
BWA v0.5.9 in paired end mode (bwa aln, bwa sampe) using default settings
GRCh37 reference from Ensembl v71 with chromosomes renamed using UCSC hg19 naming scheme
Picard v1.105 FixMateInformation
Picard v1.105 MarkDuplicates treating 514F-A and 514F-B as separate libraries

Somatic SNV calling and filtering

Somatic SNVs were called and filtered as follows:

SomaticSniper v1.0.2 with following settings: -q 1 -Q 15 -J -r 0.001000 -T 0.850000 -N 2 -s 0.01
False positive filters using fpfilter.pl script from SomaticSniper/VarScan2 (all apply to tumor reads)
Average variant position in supporting reads relative to read length between 0.1 and 0.9
Strandedness - fraction of supporting reads from forward strand between 0.01 and 0.99
Variant read count >= 4
Variant allele frequency >= 0.05
Difference in average mismatch quality sum between variant and reference reads <= 50
Average mismatch quality sum for variant reads <= 100
Difference in average mapping quality between reference and variant reads <= 30
Difference in average read length between reference and variants reads <= 25
Additional filters
Homopolymer - number of bases in a flanking homopolymer < 5
Normal genotype must contain reference base, e.g. 0/0 or 0/1 but not 1/2
Depth in normal >= 10
Variant read count in normal <= 1
Indel proximity - exclude variants within 40bp of indel called by Pindel or samtools mpileup in either the tumour or the normal
Exclude SNVs at positions of known SNPs from the NHGRI UniSNP database of uniquely mapped SNPs from dbSNP v129 and HapMap release 27
Somatic score from SomaticSniper >= 40