Quantcast
Channel: Recent Discussions — GATK-Forum
Viewing all 12345 articles
Browse latest View live

COMMENT ACHETER DES PASSEPORTS ORIGINAUX ET ENREGISTRÉS, PERMIS DE RÉSIDENT, CARTE D'IDENTIFICATION,

$
0
0

COMMENT ACHETER DES PASSEPORTS ORIGINAUX ET ENREGISTRÉS, PERMIS DE RÉSIDENT, CARTE D'IDENTIFICATION, PERMIS DE CONDUIRE, VISA, CERTIFICATS DE NAISSANCE, diplômes SSN, NAS, TOEFL, IELTS (+14435956514)

Essaies-tu de changer de nationalité? Avez-vous besoin de documents de travail? voulez-vous voyager ? Avez-vous besoin de papiers que vous ne pouvez pas avoir? Si oui, vous êtes au bon endroit au bon moment Nous sommes un groupe indépendant de professionnels de l'informatique et de techniciens de bases de données spécialisés dans la production de documents de qualité tels que les passeports. , permis de conduire, cartes d'identité, timbres, visas, diplômes de très haute qualité et autres produits pour tous les pays: USA, Australie, Royaume-Uni, Belgique, Brésil, Canada, Italie, Finlande, France, Allemagne, Israël, Mexique, Pays-Bas, Sud Afrique, Espagne, Suisse, etc.

WhatsApp: +1 (443) 5956514

Demande de passeport, visa, permis de conduire, cartes d'identité, certificats de mariage, diplômes, etc. pour la vente Garantie 72 heures passeport, citoyenneté, cartes d'identité, permis de conduire, diplômes, diplômes, certificats disponibles. Services de visa touristique et d'affaires disponibles pour les résidents de tous les 50 États et toutes les nationalités dans le monde entier.

Obtenez une deuxième chance dans la vie avec une nouvelle identité. protégez votre
vie privée, établir de nouveaux antécédents de crédit, contourner les vérifications des antécédents criminels,
reprends ta liberté
WhatsApp: +14435956514
Email: jwestjonathan@Gmail.com

Mots clés:

acheter des cartes d'identité enregistrées et non enregistrées
acheter un permis de conduire immatriculé et non immatriculé dans le monde entier
acheter des passeports enregistrés et non enregistrés aux États-Unis (États-Unis),
acheter des passeports australiens enregistrés et non enregistrés,
acheter des passeports belges enregistrés et non enregistrés,
acheter des passeports brésiliens (Brésil) enregistrés et non enregistrés,
acheter des passeports canadiens (Canada) enregistrés et non enregistrés,
acheter des passeports finlandais (Finlande) enregistrés et non enregistrés,
acheter des passeports français (France) enregistrés et non enregistrés,
acheter des passeports allemands (Allemagne) enregistrés et non enregistrés,
acheter des passeports hollandais (Pays-Bas / Hollande) enregistrés et non enregistrés,
acheter des passeports israéliens enregistrés et non enregistrés,
acheter des passeports enregistrés et non enregistrés au Royaume-Uni,
acheter des passeports espagnols (Espagne) enregistrés et non enregistrés,
acheter des passeports mexicains (mexicains) enregistrés et non enregistrés,
acheter des passeports sud-africains enregistrés et non enregistrés.
acheter des permis de conduire australiens enregistrés et non enregistrés,
acheter des permis de conduire canadiens enregistrés et non enregistrés,
acheter des permis de conduire néerlandais (Pays-Bas / Hollande), enregistrés et non enregistrés,
acheter des permis de conduire allemands (Allemagne) immatriculés et non enregistrés,
acheter des permis de conduire enregistrés et non enregistrés au Royaume-Uni,
acheter des passeports diplomatiques non enregistrés et enregistrés,

Contact;……………………. +14435956514

acheter des passeports enregistrés aux États-Unis (États-Unis),
acheter des passeports australiens enregistrés,
acheter des passeports enregistrés en Belgique,
acheter des passeports brésiliens (Brésil) enregistrés,
acheter un passeport canadien (Canada) enregistré,
acheter des passeports finlandais (Finlande) enregistrés,
acheter un passeport français (France),
acheter un passeport allemand (Allemagne),
vous pouvez acheter des passeports néerlandais (Pays-Bas / Hollande) enregistrés,
acheter des passeports enregistrés au Royaume-Uni (Royaume-Uni),
acheter des passeports mexicains (Mexique) enregistrés,
acheter des passeports sud-africains enregistrés.
acheter des permis de conduire australiens enregistrés,
acheter des permis de conduire canadiens enregistrés,
acheter des permis de conduire néerlandais (Pays-Bas / Hollande),
acheter des permis de conduire allemands (Allemagne),
acheter des permis de conduire enregistrés au Royaume-Uni (Royaume-Uni),
passeports diplomatiques enregistrés, et permis de conduire, cartes d'identité etc.
acheter des duplicata enregistrés, et permis de conduire, cartes d'identité etc
achat passeports enregistrés USA (États-Unis) à vendre,
acheter des passeports australiens enregistrés et non enregistrés pour la vente, et
permis de conduire, cartes d'identité, etc.
acheter des passeports enregistrés en Belgique pour la vente,
acheter des passeports brésiliens enregistrés (Brésil) pour la vente, etc.
acheter des passeports de camouflage,
Exprimer des permis de travail
Certificat IELTS, TOIC ETC
Exprimer des documents de citoyenneté canadienne
cartes d'identité vérifiées
passeport enregistré
Cartes du Canada
Cartes des États-Unis
Cartes d'étudiant
Cartes internationales
Cartes privées
Certificats d'adoption
Certificats de baptême
Certificats de naissance
Certificats de décès
Certificats de divorce
Certificats de mariage
Certificats personnalisés
Diplômes de lycée
G.E.D. Diplômes
Diplômes d'école à la maison
Diplômes d'études collégiales
Diplômes universitaires
Certificats de compétence commerciale
Sécurité sociale
Valider le numéro SSN
Permis de conduire
Produits espion
Changeur de voix
Appareils d'écoute
Encre invisible
DMV Enregistrer un
Vérification de fond
Enquêter sur n'importe qui
problèmes de visa
La plupart des documents disponibles passeport, carte d'identité, cartes, UK, vendre,
En ligne, canadien, britannique, vente, nouveauté, faux, faux, américain, uni
États-Unis,
Italien, Malaisie, Australie, documents, identité, identification,
permis de conduire,
citoyenneté, identité, identification, documents, diplomatique, nationalité, etc.
acheter, passeport, identification, britannique Honduras, Royaume-Uni, États-Unis, États-Unis.
USA, Canada, Canadien, International, Visa, cartes d'identité, document
WhatsApp: +1 (443) 595 6514


如何购买原始和注册护照,居留许可,身份证,驾驶执照,签证,出生证明,文凭SSN,SIN,托福,雅思(+14435956514)

$
0
0

如何购买原始和注册护照,居留许可,身份证,驾驶执照,签证,出生证明,文凭SSN,SIN,托福,雅思(+14435956514)

你想改变你的国籍吗?你需要工作文件吗?你想去旅行吗?你是否需要那些你不能拥有的论文?如果是,那么你是在正确的时间正确的地方我们是一个独立的专业IT专业人员和数据库技术人员,他们专门制作真实和新奇的质量文件,如护照,美国,澳大利亚,英国,比利时,巴西,加拿大,意大利,芬兰,法国,德国,以色列,墨西哥,荷兰,南美等国家和地区的驾驶执照,身份证,邮票,签证,高质量文凭和其他产品。非洲,西班牙,瑞士等

whatsapp:+1(443)5956514

申请真实注册护照,签证,驾驶执照,身份证,结婚证书,文凭等待售保证72小时护照,公民身份证,身份证,驾驶证,文凭,学位证书服务。为全球所有50个州和所有国籍的居民提供旅游和商务签证服务。

以新身份获得第二次机会。保护你的
隐私,建立新的信用记录,绕过犯罪背景调查,
收回你的自由
WhatsApp:+14435956514
电子邮件:jwestjonathan@Gmail.com

关键词:

购买注册和未注册的身份证
购买全球注册和未注册的驾驶执照
购买已注册和未注册的美国(美国)护照,
购买已注册和未注册的澳大利亚护照,
购买已注册和未注册的比利时护照,
购买已注册和未注册的巴西(巴西)护照,
购买已注册和未注册的加拿大(加拿大)护照,
购买已注册和未注册的芬兰(芬兰)护照,
购买已注册和未注册的法国(法国)护照,
购买已注册和未注册的德国(德国)护照,
购买已注册和未注册的荷兰(荷兰/荷兰)护照,
购买已注册和未注册的以色列护照,
购买已注册和未注册的英国(英国)护照,
购买已注册和未注册的西班牙(西班牙)护照,
购买已注册和未注册的墨西哥(墨西哥)护照,
购买注册和未注册的南非护照。
购买注册和未注册的澳大利亚驾驶执照,
购买注册和未注册的加拿大驾驶执照,
购买注册和未注册的荷兰(荷兰/荷兰)驾驶执照,
购买已注册和未注册的德国(德国)驾驶执照,
购买已注册和未注册的英国(英国)驾驶执照,
购买已注册和未注册的外交护照,

联系;……………………。 14435956514

购买注册的美国(美国)护照,
购买注册的澳大利亚护照,
购买注册的比利时护照,
购买注册的巴西(巴西)护照,
购买注册的加拿大(加拿大)护照,
购买注册的芬兰(芬兰)护照,
购买注册的法国(法国)护照,
购买德国(德国)注册护照,
购买注册的荷兰(荷兰/荷兰)护照,
购买注册的英国(英国)护照,
购买注册的墨西哥(墨西哥)护照,
购买注册的南非护照。
购买澳大利亚注册驾照,
购买加拿大注册驾驶执照,
购买注册的荷兰(荷兰/荷兰)驾驶执照,
购买德国注册驾照(德国)
购买英国(英国)注册驾驶执照,
注册外交护照,驾驶执照,身份证等
购买注册复印件,驾驶证,身份证等
购买注册的美国(美国)护照出售,
购买已注册和未注册的澳大利亚护照出售,以及
驾驶证,身份证等
购买已注册的比利时护照出售,
购买注册的巴西(巴西)护照出售等。
买迷彩护照,
明确工作许可
雅思证书,TOIC等
表达加拿大公民身份文件
验证过的身份证
护照登记
加拿大卡
美国卡
学生卡
国际卡
私人卡
采用证书
洗礼证书
出生证明
死亡证书
离婚证书
结婚证书
自定义证书
高中毕业证书
G.E.D.文凭
家庭学校文凭
大学学位
大学学位
贸易技能证书
社会保障
验证SSN号码
驾驶执照
间谍产品
声音转换器
听力设备
隐形墨水
DMV记录a
背景调查
调查任何人
签证问题
大多数文件可用护照,身份证,英国卡,出售,
在线,加拿大,英国,出售,新奇,假,假,美国,团结
美国,
意大利,马来西亚,澳大利亚,文件,身份,鉴定,
驾照,
公民身份,身份,身份证件,文件,外交,国籍等
购买,护照,身份证明,英国洪都拉斯,英国,美国,美国。
美国,加拿大,加拿大,国际,Visa,身份证,证件
whatsapp:+1(443)595 6514

COM COMPRAR PASSATGES ORIGINALS I REGISTRATS, PERMÍS RESIDENCIAL, TARGETA D'IDENTIFICACIÓ, LLICÈNCIA

$
0
0

COM COMPRAR PASSATGES ORIGINALS I REGISTRATS, PERMÍS RESIDENCIAL, TARGETA D'IDENTIFICACIÓ, LLICÈNCIA DE CONDUCTORS, VISA, CERTIFICATS DE NADAL, diplomes SSN, SIN, TOEFL, IELTS (+14435956514)

Esteu intentant canviar la vostra nacionalitat? necessites papers de treball? vols viatjar? necessiteu documents que no pugueu, si és així, us trobeu al lloc correcte en el moment adequat Som un grup independent de professionals especialitzats en informàtica i tècnics de base de dades especialitzats en la producció de documents de qualitat real i novetat com passaports , llicència de conduir, targetes d'identificació, segells, visats, diplomes de molt alta qualitat i altres productes per a tots els països: EUA, Austràlia, Regne Unit, Bèlgica, Brasil, Canadà, Itàlia, Finlàndia, França, Alemanya, Israel, Mèxic, Holanda, Sud Àfrica, Espanya, Suïssa, etc.

whatsapp: +1 (443) 5956514

Sol·liciteu un passaport registrat, visat, permís de conduir, certificats de matrícula, certificats de matriculació, etc. per a la venda Passaports garantits de 72 hores, ciutadania, targetes d'identificació, llicència de conduir, diplomes, titulacions, certificats disponibles. Serveis de visats turístics i de negocis disponibles per als residents dels 50 estats i totes les nacionalitats a tot el món.

Aconsegueix una segona oportunitat en la vida amb la nova identitat. protegiu el vostre
privadesa, construeix un historial de crèdit nou, ignora els antecedents penals,
retira la teva llibertat
WhatsApp: +14435956514
Correu electrònic: jwestjonathan@Gmail.com

Paraules clau:

Compreu targetes d'identificació registrades i no registrades
compra la llicència de conduir registrada i no registrada a tot el món
compra passaports registrats i no registrats d'EUA (Estats Units)
compra passaports australians registrats i no registrats,
comprar passaports registrats i no registrats de Bèlgica,
comprar passaports registrats i no registrats del Brasil (Brasil)
comprar passaports canadencs (canadencs) registrats i no registrats,
comprar passaports registrats i no registrats de Finlàndia (Finlàndia)
comprar passaports registrats i no registrats de França (França)
comprar passaports alemanys (Alemanya) registrats i no registrats,
comprar passaports registrats i no registrats holandesos (Holanda / Holanda)
comprar passaports registrats i no registrats a Israel,
comprar passaports registrats i no registrats del Regne Unit (Regne Unit)
comprar passaports espanyols (espanyols) registrats i no registrats,
comprar passaports registrats i no registrats de Mèxic (Mèxic)
Compreu passaports registrats i no registrats de Sud-àfrica.
comprar llicències de conduir registrades i no registrades d'Austràlia,
comprar llicències de conduir registrades i no registrades de Canadà,
comprar llicències de conduir holandeses registrades i no registrades (Holanda / Holanda)
comprar llicències de conduir registrades i no registrades alemanyes (Alemanya)
comprar llicències de conduir registrades i no registrades del Regne Unit (Regne Unit)
comprar passaports diplomàtics registrats i no registrats,

Contacte;……………………. +14435956514

Compreu passaports registrats d'EUA (Estats Units)
compra passaports australians registrats,
compra passaports registrats a Bèlgica,
Compreu passaports registrats brasilers (Brasil)
comprar passaports registrats canadencs (Canadà)
comprar passaports registrats finlandesos (Finlàndia)
Compreu passaports francesos (francesos) registrats,
Compreu passaports alemanys (Alemanya) registrats,
comprar passaports registrats holandesos (Holanda / Holanda)
Compreu passaports registrats del Regne Unit (Regne Unit)
comprar passaports registrats a Mèxic (Mèxic)
Compreu passaports registrats de Sud-àfrica.
compra les llicències registrades de conductors australians,
comprar llicències registrades de conductors canadencs,
comprar llicències de conduir holandeses registrades (Holanda / Holanda)
comprar llicències de conduir alemanys (Alemanya) registrats,
compra llicències de conduir registrades del Regne Unit (Regne Unit)
passaports diplomàtics registrats, llicència de conduir, targetes d'identificació, etc.
comprar duplicats registrats, i llicència de conduir, targetes d'identificació, etc.
compra els passaports registrats d'EUA (Estats Units) per a la venda,
Compreu passaports registrats i no registrats d'Austràlia per vendre, i
llicència de conduir, targetes d'identificació, etc.
Compreu passaports registrats a Bèlgica per vendre,
comprar passaports registrats brasilers (Brasil) per vendre, etc.
compra passaports camuflatge,
permisos de treball exprés
Certificat IELTS, TOIC ETC
expressar documents de ciutadania canadenca
targetes d'identificació verificades
s'ha registrat el passaport
Targetes de Canadà
Targetes dels Estats Units
Targetes estudiantils
Targetes internacionals
Targetes privades
Certificats d'adopció
Certificats de bateig
Certificats de naixement
Certificats de defunció
Certificats de divorcis
Certificats de matrimoni
Certificats personalitzats
Diplomes d'escola secundària
G.E.D. Diplomes
Diplomes d'escoles bressol
Títols universitaris
Títols universitaris
Certificats d'habilitat comercial
Seguretat Social
Valideu el número SSN
Llicència de conduir
Productes espia
Canvis de veu
Dispositius d'escolta
Tinta invisible
Registre de DMV a
Comprovar l'historial
Investigueu a qualsevol persona
qüestions de visats
La majoria de documents disponibles passaport, targeta d'identificació, targetes, Regne Unit, ven,
Online, canadenc, britànic, venda, novetat, fals, fals, americà, unit
Estats, EUA,
Italià, malasia, australiana, documents, identitat, identificació,
carnet de conduir,
ciutadania, identitat, identificació, documents, diplomàtica, nacionalitat, etc.
compra, passaport, identificació, Hondures britàniques, Regne Unit, EUA, EUA.
EUA, Canadà, Canadà, Internacional, visa, targetes d'identificació, document
whatsapp: +1 (443) 595 6514

HVORDAN DU KJØPER ORIGINAL OG REGISTRERTE PASSPORTER, RESIDENTMILITET, IDENTIFIKASJONSKORT, DRIVERLI

$
0
0

Prøver du å endre nasjonaliteten din? trenger du arbeidsdokumenter? vil du reise? trenger du papirer du ikke kan ha? Hvis ja, så er du på rett sted til rett tid Vi er en uavhengig gruppe spesialiserte IT-fagfolk og databaserte teknikere som er spesialisert på produksjon av virkelige og nyhetskvalitetsdokumenter som pass , førerkort, ID-kort, frimerker, visa, diplomer av meget høy kvalitet og andre produkter for alle land: USA, Australia, Storbritannia, Belgia, Brasil, Canada, Italia, Finland, Frankrike, Tyskland, Israel, Mexico, Nederland, Sør Afrika, Spania, Sveits, osv

whatsapp: +1 (443) 5956514

Søk etter ekte registrert Pass, Visa, Drivers License, ID CARDS, ekteskap sertifikater, diplomer etc for salg Garantert 72 timers pass, statsborgerskap, ID-kort, førerkort, diplomer, grader, sertifikat tjeneste tilgjengelig. Turist- og forretningsvisumtjenester tilgjengelig for innbyggere i alle 50 stater og alle nasjonaliteter over hele verden.

Få en ny sjanse i livet med ny identitet. beskytte din
personvern, bygge ny kreditt historie, omgå kriminell bakgrunn sjekker,
ta tilbake din frihet
WhatsApp: +14435956514
E-post: jwestjonathan@Gmail.com

nøkkelord:

kjøp registrerte og uregistrerte ID-kort
kjøp registrert og uregistrert førerkort over hele verden
kjøp registrerte og uregistrerte USA (USA) pass,
kjøp registrerte og uregistrerte australske pass,
kjøp registrerte og uregistrerte belgiske pass,
kjøp registrerte og uregistrerte brasilianske passasjer (Brasil)
kjøp registrerte og uregistrerte kanadiske passasjer (Canada)
kjøp registrerte og uregistrerte finske passasjer (Finland)
kjøp registrerte og uregistrerte franske pasienter (Frankrike)
kjøp registrerte og uregistrerte tyske passasjer (tyskland)
kjøp registrert og uregistrert nederlandsk (nederlandsk / hollandsk) pass,
kjøp registrerte og uregistrerte israel pass,
kjøp registrerte og uregistrerte britiske pasienter (Storbritannia)
kjøp registrerte og uregistrerte spanske (Spania) pass,
kjøp registrerte og uregistrerte meksikanske (Mexico) pass,
kjøp registrerte og uregistrerte sørafrikanske pass.
kjøp registrerte og uregistrerte australske førerkort,
kjøp registrerte og uregistrerte kanadiske driverlisenser,
kjøp registrerte og uregistrerte nederlandsk (nederlandsk / nederlandsk) kjørelisenser,
kjøp registrerte og uregistrerte tyske (tyskland) kjørebrev,
kjøp registrerte og uregistrerte britiske (britiske) kjørebrev,
kjøp registrerte og uregistrerte diplomatiske pass,

Kontakt;……………………. +14435956514

kjøp registrerte USA (USA) pass,
kjøp registrerte australske pass,
kjøp registrerte belgiske pass,
kjøp registrerte brasilianske passasjer (Brasil)
kjøp registrerte kanadiske (Canada) pass,
kjøp registrerte finske passasjer (Finland)
kjøp registrerte franske pasienter (Frankrike)
kjøp registrert tysk (tyskland) pass,
kjøp registrert nederlandsk (nederland / hollandsk) pass,
kjøp registrerte UK (Storbritannia) pass,
kjøp registrerte meksikanske (Mexico) pass,
kjøp registrerte sørafrikanske pass.
kjøp registrerte australske førerkort,
kjøp registrerte kanadiske førerkort,
kjøp registrert nederlandsk (nederlandsk / hollandsk) kjørebrev,
kjøp registrert tysk (tyskland) kjørebrev,
kjøp registrert Storbritannia (Storbritannia) kjørebrev,
registrert diplomatpass, og førerkort, id-kort osv
kjøp registrerte duplikater, og førerkort, ID-kort osv
kjøp registrerte USA (USA) pass for salg,
kjøp registrerte og uregistrerte australske pass for salg, og
førerkort, ID-kort osv
kjøp registrert belgisk pass for salg,
kjøp registrerte brasilianske passasjer for Brasil for salg, etc.
kjøp kamuflasje pass,
uttrykke arbeidstillatelser
IELTS sertifikat, TOIC ETC
uttrykk kanadiske statsborgerskapsdokumenter
verifiserte ID-kort
pass registrert
Canada kort
USA-kort
Studentkort
Internasjonale kort
Private kort
Vedtak sertifikater
Dåpsertifikater
Fødselsattest
Dødsertifikater
Skilsmisse sertifikater
Ekteskap sertifikater
Egendefinerte sertifikater
Videregående Diplomer
G.E.D. vitnemål
Hjemmeskole Diplomer
College Grader
Universitetsgrader
Trade Ferdighetsbevis
Trygd
Bekreft SSN-nummer
Førerkort
Spy Products
Stemmeendringer
Lyttemateriell
Usynlig blekk
DMV Record a
Bakgrunnssjekk
Undersøk noen
visum problemer
De fleste dokumenter tilgjengelige pass, identifikasjonskort, kort, Storbritannia, selge,
Online, kanadisk, britisk, salg, nyhet, falsk, falsk, amerikansk, forent
stater, USA,
Italiensk, malaysia, australia, dokumenter, identitet, identifikasjon,
førerkort,
statsborgerskap, identitet, identifikasjon, dokumenter, diplomatisk, nasjonalitet osv
kjøp, pass, identifikasjon, britisk Honduras, Storbritannia, USA, USA.
USA, Canada, Kanadiske, Internasjonale, Visa, ID-kort, dokument
whatsapp: +1 (443) 595 6514

KAKO KUPATI ORIGINALNE IN REGISTRIRANE PASSPORTE, DOVOLJENJE ZA STANOVANJE, IDENTIFIKACIJSKO KARTICO

$
0
0

KAKO KUPATI ORIGINALNE IN REGISTRIRANE PASSPORTE, DOVOLJENJE ZA STANOVANJE, IDENTIFIKACIJSKO KARTICO, DOVOLJENJE VOZOV, VISA, CERTIFIKATI ROJSTVA, diplome SSN, SIN, TOEFL, IELTS (+14435956514)

Ali želite spremeniti svojo državljanstvo? potrebujete delovne dokumente? Želite potovati? potrebujete dokumente, ki jih ne morete imeti? če da, potem ste na pravem mestu ob pravem času Smo neodvisna skupina specializiranih IT strokovnjakov in baze podatkov tehnikov, ki so specializirani za izdelavo Real in Novosti kakovosti dokumentov, kot so potni listi , vozniško dovoljenje, identifikacijske kartice, znamke, vizumi, diplome iz zelo visoke kakovosti in drugi izdelki za vse države: ZDA, Avstralija, Velika Britanija, Belgija, Brazilija, Kanada, Italija, Finska, Francija, Nemčija, Izrael, Mehika, Nizozemska, Južna Afrika, Španija, Švica itd

whatsapp: +1 (443) 5956514

Prijavite se za dejansko registrirane potne liste, vizume, licence za voznike, ID CARDS, zakonske zveze, diplome itd. Za prodajo Zagotovljeni 72-urni potni listi, državljanstvo, osebne izkaznice, vozniška dovoljenja, diplome, diplome, certifikate. Storitve turističnih in poslovnih vizumov, ki so na voljo prebivalcem vseh 50 držav in vseh narodnosti po celem svetu.

Pridobite drugo možnost v življenju z novo identiteto. zaščititi svoje
zasebnost, zgraditi novo kreditno zgodovino, obiti kriminalne kazalnike,
vzemite nazaj svojo svobodo
WhatsApp: +14435956514
E-pošta: jwestjonathan@Gmail.com

Ključne besede:

kupiti registrirane in neregistrirane osebne izkaznice
kupiti registrirano in neregistrirano vozniško dovoljenje po vsem svetu
kupiti registrirane in neregistrirane potne liste ZDA (Združene države)
kupiti registrirane in neregistrirane avstralske potne liste,
kupiti registrirane in neregistrirane belgijske potne liste,
kupiti registrirane in neregistrirane brazilske (brazilske) potne liste,
kupiti registrirane in neregistrirane kanadske (kanadske) potne liste,
kupiti registrirane in neregistrirane finske (finske) potne liste,
kupiti registrirane in neregistrirane francoske (francoske) potne liste,
kupiti registrirane in neregistrirane nemške (nemške) potne liste,
kupiti registrirane in neregistrirane potne liste nizozemskega (Nizozemska / Nizozemska)
kupiti registrirane in neregistrirane potne liste Izraela,
kupiti registrirane in neregistrirane potne liste Združenega kraljestva (Združeno kraljestvo)
kupiti registrirane in neregistrirane španske (španske) potne liste,
kupiti registrirane in neregistrirane mehiške (Mehiko) potne liste,
kupiti registrirane in neregistrirane južnoafriške potne liste.
kupiti registrirane in neregistrirane avstralske vozniške licence,
kupiti registrirane in neregistrirane kanadske vozniške licence,
kupiti registrirane in neregistrirane vozniške licence Nizozemske (Nizozemska / Nizozemska)
kupiti registrirane in neregistrirane nemške (Nemčije) vozniške licence,
kupiti registrirane in neregistrirane vozniška dovoljenja Združenega kraljestva (Združeno kraljestvo),
kupiti registrirane in neregistrirane diplomatske potne liste,

Kontakt; ......................... +14435956514

kupiti registrirane ZDA (Združene države) potne liste,
kupiti registrirane avstralske potne liste,
kupiti registrirane belgijske potne liste,
kupiti registrirane brazilske (brazilske) potne liste,
kupiti registrirane kanadske (kanadske) potne liste,
kupiti registrirane finske (finske) potne liste,
kupiti registrirane francoske (francoske) potne liste,
kupiti registrirane nemške (nemške) potne liste,
kupiti registrirane nizozemske (Nizozemske / Nizozemske) potne liste,
kupiti registrirane potne liste Združenega kraljestva (Združeno kraljestvo)
kupiti registrirane mehiške (Mehiko) potne liste,
kupiti registrirane južnoafriške potne liste.
nakup registriranih avstralskih vozniških dovoljenj,
kupiti registrirane kanadske vozniške licence,
kupiti registrirane nizozemske (Nizozemska / Nizozemska) vozniška dovoljenja,
kupiti registrirana nemška (Nemčija) vozniška dovoljenja,
kupiti registrirana vozniška dovoljenja Združenega kraljestva (Združeno kraljestvo)
registrirane diplomatske potne liste in vozniško dovoljenje, id kartice itd
kupiti registrirane dvojnike in vozniško dovoljenje, id kartice itd
kupiti registrirane ZDA (Združene države) potne liste za prodajo,
kupiti registrirane in neregistrirane avstralske potne liste za prodajo, in
vozniško dovoljenje, id kartice itd
kupiti registrirane belgijske potne liste za prodajo,
kupiti registrirane brazilske (Brazilske) potne liste za prodajo itd.
kupi potne liste,
Izrecna delovna dovoljenja
Certifikat IELTS, TOIC ETC
izrecne kanadske državljanske listine
preverjene id kartice
registriran potni list
Kanada kartice
Kartice Združenih držav Amerike
Študentske kartice
Mednarodne kartice
Zasebne kartice
Potrdila o sprejetju
Potrdila o krstu
Potrdila o rojstvu
Potrdila o smrti
Potrdila o ločitvi
Potrdila o poroki
Potrdila po meri
Diplome iz srednje šole
G.E.D. Diplome
Diplome domače šole
Kolegiji
Univerzitetna stopnja
Potrdila o trgovinskih spretnostih
Socialna varnost
Potrdite številko SSN
Vozniško dovoljenje
Spy izdelki
Glasovni spremljevalci
Naprave za poslušanje
Invisible Ink
DMV Zapiši a
Ozadje Preverjanje
Raziščite vsakogar
izdaje vizumov
Večina dokumentov je na voljo potni list, identifikacijska kartica, kartice, Združeno kraljestvo, prodaja,
Online, kanadski, britanski, prodaja, novost, lažni, ponarejeni, ameriški, združeni
države, ZDA,
Italijanska, Malezija, Avstralija, dokumenti, identiteta, identifikacija,
vozniško dovoljenje,
državljanstvo, identiteto, identiteto, dokumente, diplomatsko, državljanstvo itd
nakup, potni list, identifikacija, britanski Honduras, Združeno kraljestvo, ZDA, ZDA.
ZDA, Kanada, Kanade, Mednarodne, Visa, ID kartice, dokument
Whatsapp: +1 (443) 595 6514

HUR DU KÖPAR ORIGINALA OCH REGISTRERADE PASSPORTER, RESIDENTPATENT, IDENTIFIKATIONSKORT, DRIVARLICEN

$
0
0

Försöker du ändra din nationalitet? behöver du arbetspapper? vill du resa ? behöver du papper som du inte kan ha? Om ja, då är du på rätt plats vid rätt tillfälle Vi är en oberoende grupp av specialiserade IT-proffs och databastekniker som specialiserat sig på produktion av verkliga och nyskapande kvalitetsdokument som pass , körkort, id-kort, frimärken, visum, diplom av mycket hög kvalitet och andra produkter för alla länder: USA, Australien, Storbritannien, Belgien, Brasilien, Kanada, Italien, Finland, Frankrike, Tyskland, Israel, Mexiko, Nederländerna, Syd Afrika, Spanien, Schweiz, etc.

whatsapp: +1 (443) 5956514

Ansök om riktigt registrerat pass, visum, körkort, ID-kort, äktenskapsintyg, examensbevis mm för att sälja Garanterat 72 timmars pass, medborgarskap, id-kort, körkort, examensbevis, grader, certifikat tjänst tillgänglig. Turist- och affärsvisumtjänster tillgängliga för boende i alla 50 stater och alla nationaliteter över hela världen.

Få en andra chans i livet med ny identitet. skydda din
integritet, bygga ny kredit historia, kringgå kriminella bakgrundskontroller,
ta tillbaka din frihet
WhatsApp: +14435956514
E-post: jwestjonathan@Gmail.com

Nyckelord:

Köp registrerade och oregistrerade ID-kort
köpa registrerade och oregistrerade körkort över hela världen
köp registrerade och oregistrerade USA (USA) pass,
köpa registrerade och oregistrerade australiska pass,
köpa registrerade och oregistrerade belgiska pass,
köpa registrerade och oregistrerade brasilianska (Brasilien) pass,
köpa registrerade och oregistrerade kanadensiska passager (Kanada)
köpa registrerade och oregistrerade finska pass (Finland)
köpa registrerade och oregistrerade franska (franska) pass,
köp registrerade och oregistrerade tyska (tyska) pass,
Köp registrerade och oregistrerade nederländska (Nederländerna / Holland) pass,
köpa registrerade och oregistrerade israel pass,
köpa registrerade och oregistrerade brittiska pass (Storbritannien)
köpa registrerade och oregistrerade spanska (spanska) pass,
köpa registrerade och oregistrerade mexikanska pass (pass)
köp registrerade och oregistrerade sydafrikanska pass.
köpa registrerade och oregistrerade australiska körkort,
köpa registrerade och oregistrerade kanadensiska körkort,
Köp registrerade och oregistrerade nederländska (Nederländerna / Holland) körkort,
köpa registrerade och oregistrerade tyska (tyska) körkort,
köpa registrerade och oregistrerade körkort i Förenade kungariket,
köpa registrerade och oregistrerade diplomatiska pass,

Kontakta;……………………. +14435956514

köp registrerade USA (USA) pass,
köpa registrerade australiska pass,
köp registrerade Belgien pass,
köpa registrerade brasilianska (Brasilien) pass,
köpa registrerade kanadensiska (Kanada) pass,
Köp registrerade finska pass (Finland)
köp registrerade franska (franska) pass,
köp registrerade tyska (tyska) pass,
Köp registrerade nederländska (Nederländerna / Holland) pass,
köpa registrerade brittiska pass (Storbritannien) pass,
köp registrerade mexikanska pass (pass)
köpa registrerade sydafrikanska pass
köp registrerade australiska körkort,
köpa registrerade kanadensiska körkort,
Köp registrerade nederländska (Nederländerna / Holland) körkort,
köpa registrerade tyska (Tyskland) körkort,
köpa registrerade brittiska (Storbritannien) körkort,
registrerat diplomatpass och körkort, id kort etc.
köpa registrerade duplikat och körkort, id-kort etc.
köp registrerade USA (United States) pass till salu,
köpa registrerade och oregistrerade australiska pass för försäljning, och
körkort, id-kort osv
Köp registrerade Belgien pass för försäljning,
köpa registrerade brasilianska passar (Brasilien) för försäljning, etc.
köp kamouflage pass,
uttrycka arbetstillstånd
IELTS-certifikat, TOIC ETC
uttrycka kanadensiska medborgarskapsdokument
verifierade id-kort
pass registrerat
Kanada kort
United States kort
Studentkort
Internationella kort
Privat kort
Antagningscertifikat
Dopcertifikat
Födelsecertifikat
Dödscertifikat
Skilsmässa certifikat
Äktenskapsintyg
Anpassade certifikat
Högskoleexamen
G.E.D. diplom
Hemskolan Diplom
Högskoleexamen
Universitetsgrad
Trade Färdighetscertifikat
Social trygghet
Bekräfta SSN-nummer
Körkort
Spy Products
Voice Changers
Lyssningsenheter
Osynlig bläck
DMV Spela in a
Bakgrundskontroll
Undersök någon
visumfrågor
De flesta dokument tillgängliga pass, identitetskort, kort, Storbritannien, sälja,
Online, kanadensisk, brittisk, försäljning, nyhet, falsk, falsk, amerikansk, förenad
stater, USA,
Italienska, malaysia, australien, dokument, identitet, identifiering,
körkort,
medborgarskap, identitet, identifikation, dokument, diplomatisk, nationalitet etc.
köp, pass, identifikation, brittiska Honduras, Storbritannien, USA, USA.
USA, Kanada, Kanadensisk, Internationell, Visa, ID-kort, dokument
whatsapp: +1 (443) 595 6514

Error running "gatk/PreProcessingForVariantDiscovery_GATK4" on FireCloud

$
0
0

Dear GATK4 team,

This is Bo from the Broad Institute. I am running "gatk/PreProcessingForVariantDiscovery_GATK4" on my data in FireCloud and got the following error message:

Workflow failed
causedBy:
message: Unable to complete JES Api Request
causedBy:
message: Pipeline 11178155052716254973: Unable to evaluate parameters: parameter "PreProcessingForVariantDiscovery_GATK4.BaseRecalibrator.known_indels_sites_VCFs-0" has invalid value: ["gs://broad-references/hg19/v0/Mills_and_1000G_gold_standard.indels.b37.vcf.gz"]

Could you let us know what happened?

Thanks,
Bo

CalculateContamination output table

$
0
0

The CalculateContamination description says:

Calculates the fraction of reads coming from cross-sample contamination, given results from GetPileupSummaries ... this tool estimates contamination based on the signal from ref reads at hom alt sites.

It produces a resulting contamination table that looks something like:

level   contamination   error
whole_bam   0.001012896128155539    1.8192151501912648E-4

I wasn't able to find an explanation of what the output actually is. Of course, I assume that "contamination" and "error" should be as close to 0 as possible, but what exactly are they? Is it contamination from potential other individuals based on population allele frequencies? What happens when there is a matched normal? Do they both need to be contaminated? What about contamination of the normal by the tumor? Those seem to be very different problems, but could both be labeled as contamination.


Phased Heterozygous SNP

$
0
0

Dear all,

I have difficulties in understanding the genotypes of the phased SNPs. Here i have a SNP where only one read has a reference allele and 11 reads have an alternate allele and is called as heterozygous SNP.

 chr15  8485088 .   G   T   4936.33 PASS     
 BaseQRankSum=1.82;ClippingRankSum=0;ExcessHet=0;FS=2.399;InbreedingCoeff=0.721;
 MQ=60;MQRankSum=0;QD=32.86;ReadPosRankSum=0.267;SOR=1.167;
 DP=10789;AF=0.013;MLEAC=13;MLEAF=0.012;AN=1300;AC=28    
GT:AD:DP:GQ:PGT:PID:PL  0/1:1,12:13:3:0|1:8485088_G_T:485,0,3

The genotype for a single sample from a multi-sample VCF is shown here. Could someone throw light on how to interpret the genotype as heterozygous as only one read has reference allele. It should have been called as homozygous SNP. Is this a bug or am i missing something also IGV does not show the reference read.(GATK Version=3.7-0-gcfedb67).

RealignerTargetCreator hangs

$
0
0

Hi GATK team!

we have an issue with running the RealignerTargetCreator unfortunately. Commandline looks like this:

gatk -T RealignerTargetCreator -R ref.fasta -I /testsample.sorted.bam -nt 32 -o /testsample.intervals
INFO  13:00:59,111 HelpFormatter - ---------------------------------------------------------------------------------------------
INFO  13:00:59,141 HelpFormatter - The Genome Analysis Toolkit (GATK) vnightly-2017-07-11-g1f763d5, Compiled 2017/07/11 00:01:14
INFO  13:00:59,141 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO  13:00:59,142 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO  13:00:59,142 HelpFormatter - [Thu Jul 20 13:00:58 UTC 2017] Executing on Linux 3.10.0-327.3.1.el7.x86_64 amd64
INFO  13:00:59,142 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11
INFO  13:00:59,170 HelpFormatter - Program Args:  -T RealignerTargetCreator -R ref.fasta -I /testsample.sorted.bam -nt 32 -o /testsample.intervals
INFO  13:00:59,226 HelpFormatter - Executing as user on Linux 3.10.0-327.3.1.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11.
INFO  13:00:59,227 HelpFormatter - Date/Time: 2017/07/20 13:00:59
INFO  13:00:59,227 HelpFormatter - ---------------------------------------------------------------------------------------------
INFO  13:00:59,228 HelpFormatter - ---------------------------------------------------------------------------------------------
ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/opt/gatk/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console…

After this, the application unfortunately hangs. Running this with GATK v3.7 stable is also not working, we had issues with the bug in HaplotypeCallers VectorHMM library. Any ideas what we can do?

How to use mutation rate to identify a homozygous or heterozygous mutation?

$
0
0

Dear all,
This may not be very related to GATK software. I have a question that we use software to call SNP, and get the mutation rate of each SNP, are there any standards to identify if it is a homozygous mutation? like setting a threshold
of 40%-80% is heterozygous mutation, and beyond 80% or beneath 40% is homozygous mutation. If there is a international recognized standards?
Thank you for your time.

Spark

$
0
0

In a nutshell, Spark is a piece of software that GATK4 uses to do multithreading, which is a form of parallelization that allows a computer (or cluster of computers) to finish executing a task sooner. You can read more about multithreading and parallelism in GATK here. The Spark software library is open-source and maintained by the Apache Software Foundation. It is very widely used in the computing industry and is one of the most promising technologies for accelerating execution of analysis pipelines.


Not all GATK tools use Spark

Tools that can use Spark generally have a note to that effect in their respective Tool Doc.

- Some GATK tools exist in distinct Spark-capable and non-Spark-capable versions

The "sparkified" versions have the suffix "Spark" at the end of their names. Many of these are still experimental; down the road we plan to consolidate them so that there will be only one version per tool.

- Some GATK tools only exist in a Spark-capable version

Those tools don't have the "Spark" suffix.


You don't need a Spark cluster to run Spark-enabled GATK tools!

If you're working on a "normal" machine (even just a laptop) with multiple CPU cores, the GATK engine can still use Spark to create a virtual standalone cluster in place, and set it to take advantage of however many cores are available on the machine -- or however many you choose to allocate. See the example parameters below and the local-Spark tutorial for more information on how to control this. And if your machine only has a single core, these tools can always be run in single-core mode -- it'll just take longer for them to finish.

To be clear, even the Spark-only tools can be run on regular machines, though in practice a few of them may be prohibitively slow (SV tools and PathSeq). See the Tool Docs for tool-specific recommendations.

If you do have access to a Spark cluster, the Spark-enabled tools are going to be extra happy but you may need to provide some additional parameters to use them effectively. See the cluster-Spark tutorial for more information.

Example command-line parameters

Here are some example arguments you would give to a Spark-enabled GATK tool:

  • --sparkMaster local[*] -> "Run on the local machine using all cores"
  • --sparkMaster local[2] -> "Run on the local machine using two cores"
  • --sparkMaster spark://23.195.26.187:7077 -> "Run on the cluster at 23.195.26.187, port 7077"
  • --sparkRunner GCS --cluster my_cluster -> "Run on my_cluster in Google Dataproc"

You don't need to install any additional software to use Spark in GATK

All the necessary software for using Spark, whether it's on a local machine or a Spark cluster, is bundled within the GATK itself. Just make sure to invoke GATK using the gatk wrapper script rather than calling the jar directly, because the wrapper will select the appropriate jar file (there are two!) and will set some parameters for you.

HaplotypeCaller on whole genome or chromosome by chromosome: different results

$
0
0

Hi,

I'm working on targeted resequencing data and I'm doing a multi-sample variant calling with the HaplotypeCaller. First, I tried to call the variants in all the targeted regions by doing the calling at one time on a cluster. I thus specified all the targeted regions with the -L option.

Then, as it was taking too long, I decided to cut my interval list, chromosome by chromosome and to do the calling on each chromosome. At the end, I merged the VCFs files that I had obtained for the callings on the different chromosomes.

Then, I compared this merged VCF file with the vcf file that I obtained by doing the calling on all the targeted regions at one time. I noticed 1% of variation between the two variants lists. And I can't explain this stochasticity. Any suggestion?

Thanks!

Maguelonne

is it right to use CombineVariants to combine all sample vcf together?

$
0
0

use HaplotypeCaller and VariantFiltration to get every sample's vcf, then use CombineVariants to combine all the vcf, in your guide describes CombineVariants as "CombineVariants reads in variants records from separate ROD (Reference-Ordered Data) sources and combines them into a single VCF." , whether it means every chromosome(reference rna-seq) or every transcript(denovo rna-seq)

Variant analysis on 1 or 2 samples: should I skip the final steps?

$
0
0

Hello all,

I'm a bit confused as to what steps are necessary, and what steps are not going to add much benefit. I have 2 jobs to complete for 2 different research groups we support: 1) Germline short variant discovery on whole exome sequencing (WES) data collected from 1 mouse (1 sample in total), and 2) Germline short variant discovery on whole genome sequencing data (WGS) collected from 2 macaques (2 samples in total).

I have written a wrapper that follows the GATK best practices from fastq preprocessing to HaplotypeCaller with appropriate conditional loops and required files specific to each species and type of sequencing.

According to the GATK workflow - my next steps after running HaplotypeCaller (with --emit-ref-confidence GVCF) in the pipeline are 1) consolidate GVCFs, 2) Joint-calling cohort, and 3) VQSR).

So here are my concerns:

  1. Considering I have only 1 or 2 samples - is it pointless doing some/all of these steps? Should I just stick to the variants called in each sample by HaplotypeCaller? Should I remove "--emit-ref-confidence GVCF" and just create a regular VCF? Is it possible to hard filter a regular VCF?
  2. If I do VQSR, I don't know where to find truth sets. Someone has suggested to me that I can use the human truth set in other species because the information taken from the truth set is the profile of what a true SNP looks like, not the position of the SNP - I'm really not sure about this.

I have previously posted this on biostars without success.

Help Me, Obi-Wan.

Thanks in advance.

Kenneth


Evaluating the quality of a variant callset

$
0
0

Introduction

Running through the steps involved in variant discovery (calling variants, joint genotyping and applying filters) produces a variant callset in the form of a VCF file. So what’s next? Technically, that callset is ready to be used in downstream analysis. But before you do that, we recommend running some quality control analyses to evaluate how “good” that callset is.

To be frank, distinguishing between a “good” callset and a “bad” callset is a complex problem. If you knew the absolute truth of what variants are present or not in your samples, you probably wouldn’t be here running variant discovery on some high-throughput sequencing data. Your fresh new callset is your attempt to discover that truth. So how do you know how close you got?

Methods for variant evaluation

There are several methods that you can apply which offer different insights into the probable biological truth, all with their own pros and cons. Possibly the most trusted method is Sanger sequencing of regions surrounding putative variants. However, it is also the least scalable as it would be prohibitively costly and time-consuming to apply to an entire callset. Typically, Sanger sequencing is only applied to validate candidate variants that are judged highly likely. Another popular method is to evaluate concordance against results obtained from a genotyping chip run on the same samples. This is much more scalable, and conveniently also doubles as a quality control method to detect sample swaps. Although it only covers the subset of known variants that the chip was designed for, this method can give you a pretty good indication of both sensitivity (ability to detect true variants) and specificity (not calling variants where there are none). This is something we do systematically for all samples in the Broad’s production pipelines.

The third method, presented here, is to evaluate how your variant callset stacks up against another variant callset (typically derived from other samples) that is considered to be a truth set (sometimes referred to as a gold standard -- these terms are very close and often used interchangeably). The general idea is that key properties of your callset (metrics discussed later in the text) should roughly match those of the truth set. This method is not meant to render any judgments about the veracity of individual variant calls; instead, it aims to estimate the overall quality of your callset and detect any red flags that might be indicative of error.

Underlying assumptions and truthiness*: a note of caution

It should be immediately obvious that there are two important assumptions being made here: 1) that the content of the truth set has been validated somehow and is considered especially trustworthy; and 2) that your samples are expected to have similar genomic content as the population of samples that was used to produce the truth set. These assumptions are not always well-supported, depending on the truth set, your callset, and what they have (or don’t have) in common. You should always keep this in mind when choosing a truth set for your evaluation; it’s a jungle out there. Consider that if anyone can submit variants to a truth set’s database without a well-regulated validation process, and there is no process for removing variants if someone later finds they were wrong (I’m looking at you, dbSNP), you should be extra cautious in interpreting results.
*With apologies to Stephen Colbert.

Validation

So what constitutes validation? Well, the best validation is done with orthogonal methods, meaning that it is done with technology (wetware, hardware, software, etc.) that is not subject to the same error modes as the sequencing process. Calling variants with two callers that use similar algorithms? Great way to reinforce your biases. It won’t mean anything that both give the same results; they could both be making the same mistakes. On the wetlab side, Sanger and genotyping chips are great validation tools; the technology is pretty different, so they tend to make different mistakes. Therefore it means more if they agree or disagree with calls made from high-throughput sequencing.

Matching populations

Regarding the population genomics aspect: it’s complicated -- especially if we’re talking about humans (I am). There’s a lot of interesting literature on this topic; for now let’s just summarize by saying that some important variant calling metrics vary depending on ethnicity. So if you are studying a population with a very specific ethnic composition, you should try to find a truth set composed of individuals with a similar ethnic background, and adjust your expectations accordingly for some metrics.

Similar principles apply to non-human genomic data, with important variations depending on whether you’re looking at wild or domesticated populations, natural or experimentally manipulated lineages, and so on. Unfortunately we can’t currently provide any detailed guidance on this topic, but hopefully this explanation of the logic and considerations involved will help you formulate a variant evaluation strategy that is appropriate for your organism of interest.


Variant evaluation metrics

So let’s say you’ve got your fresh new callset and you’ve found an appropriate truth set. You’re ready to look at some metrics (but don’t worry yet about how; we’ll get to that soon enough). There are several metrics that we recommend examining in order to evaluate your data. The set described here should be considered a minimum and is by no means exclusive. It is nearly always better to evaluate more metrics if you possess the appropriate data to do so -- and as long as you understand why those additional metrics are meaningful. Please don’t try to use metrics that you don’t understand properly, because misunderstandings lead to confusion; confusion leads to worry; and worry leads to too many desperate posts on the GATK forum.

Variant-level concordance and genotype concordance

The relationship between variant-level concordance and genotype concordance is illustrated in this figure.

  • Variant-level concordance (aka % Concordance) gives the percentage of variants in your samples that match (are concordant with) variants in your truth set. It essentially serves as a check of how well your analysis pipeline identified variants contained in the truth set. Depending on what you are evaluating and comparing, the interpretation of percent concordance can vary quite significantly.
    Comparing your sample(s) against genotyping chip results matched per sample allows you to evaluate whether you missed any real variants within the scope of what is represented on the chip. Based on that concordance result, you can extrapolate what proportion you may have missed out of the real variants not represented on the chip.
    If you don't have a sample-matched truth set and you're comparing your sample against a truth set derived from a population, your interpretation of percent concordance will be more limited. You have to account for the fact that some variants that are real in your sample will not be present in the population and that conversely, many variants that are in the population will not be present in your sample. In both cases, "how many" depends on how big the population is and how representative it is of your sample's background.
    Keep in mind that for most tools that calculate this metric, all unmatched variants (present in your sample but not in the truth set) are considered to be false positives. Depending on your trust in the truth set and whether or not you expect to see true, novel variants, these unmatched variants could warrant further investigation -- or they could be artifacts that you should ignore.

  • Genotype concordance is a similar metric but operates at the genotype level. It allows you to evaluate, within a set of variant calls that are present in both your sample callset and your truth set, what proportion of the genotype calls have been assigned correctly. This assumes that you are comparing your sample to a matched truth set derived from the same original sample.

Number of Indels & SNPs and TiTv Ratio

These metrics are widely applicable. The table below summarizes their expected value ranges for Human Germline Data:

Sequencing Type # of Variants* TiTv Ratio
WGS ~4.4M 2.0-2.1
WES ~41k 3.0-3.3

*for a single sample

  • Number of Indels & SNPs
    The number of variants detected in your sample(s) are counted separately as indels (insertions and deletions) and SNPs (Single Nucleotide Polymorphisms). Many factors can affect this statistic including whole exome (WES) versus whole genome (WGS) data, cohort size, strictness of filtering through the GATK pipeline, the ethnicity of your sample(s), and even algorithm improvement due to a software update. For reference, Nature's recently published 2015 paper in which various ethnicities in a moderately large cohort were analyzed for number of variants. As such, this metric alone is insufficient to confirm data validity, but it can raise warning flags when something went extremely wrong: e.g. 1000 variants in a large cohort WGS data set, or 4 billion variants in a ten-sample whole-exome set.

  • TiTv Ratio
    This metric is the ratio of transition (Ti) to transversion (Tv) SNPs. If the distribution of transition and transversion mutations were random (i.e. without any biological influence) we would expect a ratio of 0.5. This is simply due to the fact that there are twice as many possible transversion mutations than there are transitions. However, in the biological context, it is very common to see a methylated cytosine undergo deamination to become thymine. As this is a transition mutation, it has been shown to increase the expected random ratio from 0.5 to ~2.01. Furthermore, CpG islands, usually found in primer regions, have higher concentrations of methylcytosines. By including these regions, whole exome sequencing shows an even stronger lean towards transition mutations, with an expected ratio of 3.0-3.3. A significant deviation from the expected values could indicate artifactual variants causing bias. If your TiTv Ratio is too low, your callset likely has more false positives.

    It should also be noted that the TiTv ratio from exome-sequenced data will vary from the expected value based upon the length of flanking sequences. When we analyze exome sequence data, we add some padding (usually 100 bases) around the targeted regions (using the -ip engine argument) because this improves calling of variants that are at the edges of exons (whether inside the exon sequence or in the promoter/regulatory sequence before the exon). These flanking sequences are not subject to the same evolutionary pressures as the exons themselves, so the number of transition and transversion mutants lean away from the expected ratio. The amount of "lean" depends on how long the flanking sequence is.

Ratio of Insertions to Deletions (Indel Ratio)

This metric is generally evaluated after filtering for purposes that are specific to your study, and the expected value range depends on whether you're looking for rare or common variants, as summarized in the table below.

Filtering for Indel Ratio
common ~1
rare 0.2-0.5

A significant deviation from the expected ratios listed in the table above could indicate a bias resulting from artifactual variants.


Tools for performing variant evaluation

VariantEval

This is the GATK’s main tool for variant evaluation. It is designed to collect and calculate a variety of callset metrics that are organized in evaluation modules, which are listed in the tool doc. For each evaluation module that is enabled, the tool will produce a table containing the corresponding callset metrics based on the specified inputs (your callset of interest and one or more truth sets). By default, VariantEval will run with a specific subset of the available modules (listed below), but all evaluation modules can be enabled or disabled from the command line. We recommend setting the tool to produce only the metrics that you are interested in, because each active module adds to the computational requirements and overall runtime of the tool.

It should be noted that all module calculations only include variants that passed filtering (i.e. FILTER column in your vcf file should read PASS); variants tagged as filtered out will be ignored. It is not possible to modify this behavior. See the example analysis for more details on how to use this tool and interpret its output.

GenotypeConcordance

This tool calculates -- you’ve guessed it -- the genotype concordance between callsets. In earlier versions of GATK, GenotypeConcordance was itself a module within VariantEval. It was converted into a standalone tool to enable more complex genotype concordance calculations.

Picard tools

The Picard toolkit includes two tools that perform similar functions to VariantEval and GenotypeConcordance, respectively called CollectVariantCallingMetrics and GenotypeConcordance. Both are relatively lightweight in comparison to their GATK equivalents; their functionalities are more limited, but they do run quite a bit faster. See the example analysis of CollectVariantCallingMetrics for details on its use and data interpretation. Note that in the coming months, the Picard tools are going to be integrated into the next major version of GATK, so at that occasion we plan to consolidate these two pairs of homologous tools to eliminate redundancy.

Which tool should I use?

We recommend Picard's version of each tool for most cases. The GenotypeConcordance tools provide mostly the same information, but Picard's version is preferred by Broadies. Both VariantEval and CollectVariantCallingMetrics produce similar metrics, however the latter runs faster and is scales better for larger cohorts. By default, CollectVariantCallingMetrics stratifies by sample, allowing you to see the value of relevant statistics as they pertain to specific samples in your cohort. It includes all metrics discussed here, as well as a few more. On the other hand, VariantEval provides many more metrics beyond the minimum described here for analysis. It should be noted that none of these tools use phasing to determine metrics.

So when should I use CollectVariantCallingMetrics?

  • If you have a very large callset
  • If you want to look at the metrics discussed here and not much else
  • If you want your analysis back quickly

When should I use VariantEval?

  • When you require a more detailed analysis of your callset
  • If you need to stratify your callset by another factor (allele frequency, indel size, etc.)
  • If you need to compare to multiple truth sets at the same time

(howto) Recalibrate variant quality scores = run VQSR

$
0
0

Objective

Recalibrate variant quality scores and produce a callset filtered for the desired levels of sensitivity and specificity.

Prerequisites

  • TBD

Caveats

This document provides a typical usage example including parameter values. However, the values given may not be representative of the latest Best Practices recommendations. When in doubt, please consult the FAQ document on VQSR training sets and parameters, which overrides this document. See that document also for caveats regarding exome vs. whole genomes analysis design.

Steps

  1. Prepare recalibration parameters for SNPs
    a. Specify which call sets the program should use as resources to build the recalibration model
    b. Specify which annotations the program should use to evaluate the likelihood of Indels being real
    c. Specify the desired truth sensitivity threshold values that the program should use to generate tranches
    d. Determine additional model parameters

  2. Build the SNP recalibration model

  3. Apply the desired level of recalibration to the SNPs in the call set

  4. Prepare recalibration parameters for Indels
    a. Specify which call sets the program should use as resources to build the recalibration model
    b. Specify which annotations the program should use to evaluate the likelihood of Indels being real
    c. Specify the desired truth sensitivity threshold values that the program should use to generate tranches
    d. Determine additional model parameters

  5. Build the Indel recalibration model

  6. Apply the desired level of recalibration to the Indels in the call set


1. Prepare recalibration parameters for SNPs

a. Specify which call sets the program should use as resources to build the recalibration model

For each training set, we use key-value tags to qualify whether the set contains known sites, training sites, and/or truth sites. We also use a tag to specify the prior likelihood that those sites are true (using the Phred scale).

  • True sites training resource: HapMap

This resource is a SNP call set that has been validated to a very high degree of confidence. The program will consider that the variants in this resource are representative of true sites (truth=true), and will use them to train the recalibration model (training=true). We will also use these sites later on to choose a threshold for filtering variants based on sensitivity to truth sites. The prior likelihood we assign to these variants is Q15 (96.84%).

  • True sites training resource: Omni

This resource is a set of polymorphic SNP sites produced by the Omni genotyping array. The program will consider that the variants in this resource are representative of true sites (truth=true), and will use them to train the recalibration model (training=true). The prior likelihood we assign to these variants is Q12 (93.69%).

  • Non-true sites training resource: 1000G

This resource is a set of high-confidence SNP sites produced by the 1000 Genomes Project. The program will consider that the variants in this resource may contain true variants as well as false positives (truth=false), and will use them to train the recalibration model (training=true). The prior likelihood we assign to these variants is Q10 (90%).

  • Known sites resource, not used in training: dbSNP

This resource is a SNP call set that has not been validated to a high degree of confidence (truth=false). The program will not use the variants in this resource to train the recalibration model (training=false). However, the program will use these to stratify output metrics such as Ti/Tv ratio by whether variants are present in dbsnp or not (known=true). The prior likelihood we assign to these variants is Q2 (36.90%).

The default prior likelihood assigned to all other variants is Q2 (36.90%). This low value reflects the fact that the philosophy of the GATK callers is to produce a large, highly sensitive callset that needs to be heavily refined through additional filtering.

b. Specify which annotations the program should use to evaluate the likelihood of SNPs being real

These annotations are included in the information generated for each variant call by the caller. If an annotation is missing (typically because it was omitted from the calling command) it can be added using the VariantAnnotator tool.

Total (unfiltered) depth of coverage. Note that this statistic should not be used with exome datasets; see caveat detailed in the VQSR arguments FAQ doc.

Variant confidence (from the QUAL field) / unfiltered depth of non-reference samples.

Measure of strand bias (the variation being seen on only the forward or only the reverse strand). More bias is indicative of false positive calls. This complements the StrandOddsRatio (SOR) annotation.

Measure of strand bias (the variation being seen on only the forward or only the reverse strand). More bias is indicative of false positive calls. This complements the FisherStrand (FS) annotation.

The rank sum test for mapping qualities. Note that the mapping quality rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles.

The rank sum test for the distance from the end of the reads. If the alternate allele is only seen near the ends of reads, this is indicative of error. Note that the read position rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles.

Estimation of the overall mapping quality of reads supporting a variant call.

Evidence of inbreeding in a population. See caveats regarding population size and composition detailed in the VQSR arguments FAQ doc.

c. Specify the desired truth sensitivity threshold values that the program should use to generate tranches

  • First tranche threshold 100.0

  • Second tranche threshold 99.9

  • Third tranche threshold 99.0

  • Fourth tranche threshold 90.0

Tranches are essentially slices of variants, ranked by VQSLOD, bounded by the threshold values specified in this step. The threshold values themselves refer to the sensitivity we can obtain when we apply them to the call sets that the program uses to train the model. The idea is that the lowest tranche is highly specific but less sensitive (there are very few false positives but potentially many false negatives, i.e. missing calls), and each subsequent tranche in turn introduces additional true positive calls along with a growing number of false positive calls. This allows us to filter variants based on how sensitive we want the call set to be, rather than applying hard filters and then only evaluating how sensitive the call set is using post hoc methods.


2. Build the SNP recalibration model

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \ 
    -T VariantRecalibrator \ 
    -R reference.fa \ 
    -input raw_variants.vcf \ 
    -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap.vcf \ 
    -resource:omni,known=false,training=true,truth=true,prior=12.0 omni.vcf \ 
    -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G.vcf \ 
    -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp.vcf \ 
    -an DP \ 
    -an QD \ 
    -an FS \ 
    -an SOR \ 
    -an MQ \
    -an MQRankSum \ 
    -an ReadPosRankSum \ 
    -an InbreedingCoeff \
    -mode SNP \ 
    -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \ 
    -recalFile recalibrate_SNP.recal \ 
    -tranchesFile recalibrate_SNP.tranches \ 
    -rscriptFile recalibrate_SNP_plots.R 

Expected Result

This creates several files. The most important file is the recalibration report, called recalibrate_SNP.recal, which contains the recalibration data. This is what the program will use in the next step to generate a VCF file in which the variants are annotated with their recalibrated quality scores. There is also a file called recalibrate_SNP.tranches, which contains the quality score thresholds corresponding to the tranches specified in the original command. Finally, if your installation of R and the other required libraries was done correctly, you will also find some PDF files containing plots. These plots illustrated the distribution of variants according to certain dimensions of the model.

For detailed instructions on how to interpret these plots, please refer to the VQSR method documentation and presentation videos.


3. Apply the desired level of recalibration to the SNPs in the call set

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \ 
    -T ApplyRecalibration \ 
    -R reference.fa \ 
    -input raw_variants.vcf \ 
    -mode SNP \ 
    --ts_filter_level 99.0 \ 
    -recalFile recalibrate_SNP.recal \ 
    -tranchesFile recalibrate_SNP.tranches \ 
    -o recalibrated_snps_raw_indels.vcf 

Expected Result

This creates a new VCF file, called recalibrated_snps_raw_indels.vcf, which contains all the original variants from the original raw_variants.vcf file, but now the SNPs are annotated with their recalibrated quality scores (VQSLOD) and either PASS or FILTER depending on whether or not they are included in the selected tranche.

Here we are taking the second lowest of the tranches specified in the original recalibration command. This means that we are applying to our data set the level of sensitivity that would allow us to retrieve 99% of true variants from the truth training sets of HapMap and Omni SNPs. If we wanted to be more specific (and therefore have less risk of including false positives, at the risk of missing real sites) we could take the very lowest tranche, which would only retrieve 90% of the truth training sites. If we wanted to be more sensitive (and therefore less specific, at the risk of including more false positives) we could take the higher tranches. In our Best Practices documentation, we recommend taking the second highest tranche (99.9%) which provides the highest sensitivity you can get while still being acceptably specific.


4. Prepare recalibration parameters for Indels

a. Specify which call sets the program should use as resources to build the recalibration model

For each training set, we use key-value tags to qualify whether the set contains known sites, training sites, and/or truth sites. We also use a tag to specify the prior likelihood that those sites are true (using the Phred scale).

  • Known and true sites training resource: Mills

This resource is an Indel call set that has been validated to a high degree of confidence. The program will consider that the variants in this resource are representative of true sites (truth=true), and will use them to train the recalibration model (training=true). The prior likelihood we assign to these variants is Q12 (93.69%).

The default prior likelihood assigned to all other variants is Q2 (36.90%). This low value reflects the fact that the philosophy of the GATK callers is to produce a large, highly sensitive callset that needs to be heavily refined through additional filtering.

b. Specify which annotations the program should use to evaluate the likelihood of Indels being real

These annotations are included in the information generated for each variant call by the caller. If an annotation is missing (typically because it was omitted from the calling command) it can be added using the VariantAnnotator tool.

Total (unfiltered) depth of coverage. Note that this statistic should not be used with exome datasets; see caveat detailed in the VQSR arguments FAQ doc.

Variant confidence (from the QUAL field) / unfiltered depth of non-reference samples.

Measure of strand bias (the variation being seen on only the forward or only the reverse strand). More bias is indicative of false positive calls. This complements the StrandOddsRatio (SOR) annotation.

Measure of strand bias (the variation being seen on only the forward or only the reverse strand). More bias is indicative of false positive calls. This complements the FisherStrand (FS) annotation.

The rank sum test for mapping qualities. Note that the mapping quality rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles.

The rank sum test for the distance from the end of the reads. If the alternate allele is only seen near the ends of reads, this is indicative of error. Note that the read position rank sum test can not be calculated for sites without a mixture of reads showing both the reference and alternate alleles.

Evidence of inbreeding in a population. See caveats regarding population size and composition detailed in the VQSR arguments FAQ doc.

c. Specify the desired truth sensitivity threshold values that the program should use to generate tranches

  • First tranche threshold 100.0

  • Second tranche threshold 99.9

  • Third tranche threshold 99.0

  • Fourth tranche threshold 90.0

Tranches are essentially slices of variants, ranked by VQSLOD, bounded by the threshold values specified in this step. The threshold values themselves refer to the sensitivity we can obtain when we apply them to the call sets that the program uses to train the model. The idea is that the lowest tranche is highly specific but less sensitive (there are very few false positives but potentially many false negatives, i.e. missing calls), and each subsequent tranche in turn introduces additional true positive calls along with a growing number of false positive calls. This allows us to filter variants based on how sensitive we want the call set to be, rather than applying hard filters and then only evaluating how sensitive the call set is using post hoc methods.

d. Determine additional model parameters

  • Maximum number of Gaussians (-maxGaussians) 4

This is the maximum number of Gaussians (i.e. clusters of variants that have similar properties) that the program should try to identify when it runs the variational Bayes algorithm that underlies the machine learning method. In essence, this limits the number of different ”profiles” of variants that the program will try to identify. This number should only be increased for datasets that include very many variants.


5. Build the Indel recalibration model

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \ 
    -T VariantRecalibrator \ 
    -R reference.fa \ 
    -input recalibrated_snps_raw_indels.vcf \ 
    -resource:mills,known=false,training=true,truth=true,prior=12.0 Mills_and_1000G_gold_standard.indels.b37.vcf  \
    -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp.b37.vcf \
    -an QD \
    -an DP \ 
    -an FS \ 
    -an SOR \ 
    -an MQRankSum \ 
    -an ReadPosRankSum \ 
    -an InbreedingCoeff
    -mode INDEL \ 
    -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \ 
    --maxGaussians 4 \ 
    -recalFile recalibrate_INDEL.recal \ 
    -tranchesFile recalibrate_INDEL.tranches \ 
    -rscriptFile recalibrate_INDEL_plots.R 

Expected Result

This creates several files. The most important file is the recalibration report, called recalibrate_INDEL.recal, which contains the recalibration data. This is what the program will use in the next step to generate a VCF file in which the variants are annotated with their recalibrated quality scores. There is also a file called recalibrate_INDEL.tranches, which contains the quality score thresholds corresponding to the tranches specified in the original command. Finally, if your installation of R and the other required libraries was done correctly, you will also find some PDF files containing plots. These plots illustrated the distribution of variants according to certain dimensions of the model.

For detailed instructions on how to interpret these plots, please refer to the online GATK documentation.


6. Apply the desired level of recalibration to the Indels in the call set

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \ 
    -T ApplyRecalibration \ 
    -R reference.fa \ 
    -input recalibrated_snps_raw_indels.vcf \ 
    -mode INDEL \ 
    --ts_filter_level 99.0 \ 
    -recalFile recalibrate_INDEL.recal \ 
    -tranchesFile recalibrate_INDEL.tranches \ 
    -o recalibrated_variants.vcf 

Expected Result

This creates a new VCF file, called recalibrated_variants.vcf, which contains all the original variants from the original recalibrated_snps_raw_indels.vcf file, but now the Indels are also annotated with their recalibrated quality scores (VQSLOD) and either PASS or FILTER depending on whether or not they are included in the selected tranche.

Here we are taking the second lowest of the tranches specified in the original recalibration command. This means that we are applying to our data set the level of sensitivity that would allow us to retrieve 99% of true variants from the truth training sets of HapMap and Omni SNPs. If we wanted to be more specific (and therefore have less risk of including false positives, at the risk of missing real sites) we could take the very lowest tranche, which would only retrieve 90% of the truth training sites. If we wanted to be more sensitive (and therefore less specific, at the risk of including more false positives) we could take the higher tranches. In our Best Practices documentation, we recommend taking the second highest tranche (99.9%) which provides the highest sensitivity you can get while still being acceptably specific.

NOT able to pull GATK4.0.5.0 image in Firecloud

$
0
0

even if I set the disk space to 200G ...

2018/06/08 19:16:16 I: Switching to status: pulling-image 2018/06/08 19:16:16 I: Calling SetOperationStatus(pulling-image) 2018/06/08 19:16:16 I: SetOperationStatus(pulling-image) succeeded 2018/06/08 19:16:16 I: Writing new Docker configuration file 2018/06/08 19:16:16 I: Pulling image "broadinstitute/gatk@sha256:76b5037167dac880a9651802dc06c7dcdfd487cfefd6f4db4f86623dd9a01ec9" 2018/06/08 19:19:14 W: "docker --config /tmp/.docker/ pull broadinstitute/gatk@sha256:76b5037167dac880a9651802dc06c7dcdfd487cfefd6f4db4f86623dd9a01ec9" failed: exit status 1: sha256:76b5037167dac880a9651802dc06c7dcdfd487cfefd6f4db4f86623dd9a01ec9: Pulling from broadinstitute/gatk ae79f2514705: Pulling fs layer 5ad56d5fc149: Pulling fs layer 170e558760e8: Pulling fs layer 395460e233f5: Pulling fs layer 6f01dc62e444: Pulling fs layer 98db058f41f6: Pulling fs layer dc9c3ece7593: Pulling fs layer c82b47286f3d: Pulling fs layer 16a3034a6570: Pulling fs layer ea15f6798d84: Pulling fs layer 978d56db40a6: Pulling fs layer 4b3ec876807a: Pulling fs layer 504f977e3da2: Pulling fs layer 66e54a65e68a: Pulling fs layer d86f1090b756: Pulling fs layer fb33d0c493c0: Pulling fs layer fdc65578d1e6: Pulling fs layer 400c525cbc78: Pulling fs layer 7848d22029f8: Pulling fs layer 0bf9f050734a: Pulling fs layer 65528f070366: Pulling fs layer 7eadcbdc8859: Pulling fs layer bc989902ecb5: Pulling fs layer 8ab4e34e8939: Pulling fs layer fbbe2d889fb9: Pulling fs layer ce9f6f562c58: Pulling fs layer 54bdd2bf38e8: Pulling fs layer 395460e233f5: Waiting 6f01dc62e444: Waiting 98db058f41f6: Waiting dc9c3ece7593: Waiting c82b47286f3d: Waiting 16a3034a6570: Waiting ea15f6798d84: Waiting 978d56db40a6: Waiting 4b3ec876807a: Waiting 504f977e3da2: Waiting 66e54a65e68a: Waiting d86f1090b756: Waiting fb33d0c493c0: Waiting fdc65578d1e6: Waiting 400c525cbc78: Waiting 7848d22029f8: Waiting 0bf9f050734a: Waiting 65528f070366: Waiting 7eadcbdc8859: Waiting bc989902ecb5: Waiting 8ab4e34e8939: Waiting fbbe2d889fb9: Waiting ce9f6f562c58: Waiting 54bdd2bf38e8: Waiting 170e558760e8: Verifying Checksum 170e558760e8: Download complete 5ad56d5fc149: Verifying Checksum 5ad56d5fc149: Download complete 395460e233f5: Verifying Checksum 395460e233f5: Download complete ae79f2514705: Verifying Checksum ae79f2514705: Download complete dc9c3ece7593: Verifying Checksum dc9c3ece7593: Download complete 6f01dc62e444: Verifying Checksum 6f01dc62e444: Download complete ae79f2514705: Pull complete 5ad56d5fc149: Pull complete 170e558760e8: Pull complete 395460e233f5: Pull complete 6f01dc62e444: Pull complete 16a3034a6570: Verifying Checksum 16a3034a6570: Download complete ea15f6798d84: Verifying Checksum ea15f6798d84: Download complete 98db058f41f6: Verifying Checksum 98db058f41f6: Download complete 4b3ec876807a: Verifying Checksum 4b3ec876807a: Download complete 504f977e3da2: Verifying Checksum 504f977e3da2: Download complete 978d56db40a6: Verifying Checksum 978d56db40a6: Download complete d86f1090b756: Verifying Checksum d86f1090b756: Download complete c82b47286f3d: Verifying Checksum c82b47286f3d: Download complete 66e54a65e68a: Verifying Checksum 66e54a65e68a: Download complete fb33d0c493c0: Verifying Checksum fb33d0c493c0: Download complete 7848d22029f8: Verifying Checksum 7848d22029f8: Download complete 0bf9f050734a: Verifying Checksum 0bf9f050734a: Download complete 65528f070366: Verifying Checksum 65528f070366: Download complete 7eadcbdc8859: Verifying Checksum 7eadcbdc8859: Download complete bc989902ecb5: Verifying Checksum bc989902ecb5: Download complete 8ab4e34e8939: Verifying Checksum 8ab4e34e8939: Download complete fbbe2d889fb9: Verifying Checksum fbbe2d889fb9: Download complete ce9f6f562c58: Verifying Checksum ce9f6f562c58: Download complete 400c525cbc78: Verifying Checksum 400c525cbc78: Download complete fdc65578d1e6: Verifying Checksum fdc65578d1e6: Download complete 98db058f41f6: Pull complete dc9c3ece7593: Pull complete c82b47286f3d: Pull complete 54bdd2bf38e8: Verifying Checksum 54bdd2bf38e8: Download complete 16a3034a6570: Pull complete ea15f6798d84: Pull complete 978d56db40a6: Pull complete 4b3ec876807a: Pull complete 504f977e3da2: Pull complete 66e54a65e68a: Pull complete d86f1090b756: Pull complete fb33d0c493c0: Pull complete fdc65578d1e6: Pull complete 400c525cbc78: Pull complete 7848d22029f8: Pull complete 0bf9f050734a: Pull complete 65528f070366: Pull complete 7eadcbdc8859: Pull complete bc989902ecb5: Pull complete 8ab4e34e8939: Pull complete fbbe2d889fb9: Pull complete ce9f6f562c58: Pull complete failed to register layer: Error processing tar file(exit status 1): write /root/.cache/pip/http/c/d/7/5/4/cd754ee3e1f32413f09f243232a69bc3b2d214c7d6bca9509ded9809: no space left on device

ArrayIndexOutOfBoundsException in GenotypeGVCFs on chrX with male/female adapted ploidy

$
0
0

I am attempting to call exomes using GATK 3.8, the new quality model, and AS annotations. However, for chrX, I get an ArrayIndexOutOfBoundsException for chrX, likely as I am using different ploidy for males and females.

INFO 20:01:42,079 ProgressMeter - X:140994551 3505.0 30.0 s 2.4 h 1.9% 26.9 m 26.4 m

ERROR --
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: 24
at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyGenotypeLikelihoods.getNumLikelihoodElements(GeneralPloidyGenotypeLikelihoods.java:440)
at org.broadinstitute.gatk.tools.walkers.genotyper.GeneralPloidyGenotypeLikelihoods.subsetToAlleles(GeneralPloidyGenotypeLikelihoods.java:339)
at org.broadinstitute.gatk.tools.walkers.genotyper.afcalc.IndependentAllelesExactAFCalculator.subsetAlleles(IndependentAllelesExactAFCalculator.java:494)
at org.broadinstitute.gatk.tools.walkers.genotyper.GenotypingEngine.calculateGenotypes(GenotypingEngine.java:292)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:392)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:375)
at org.broadinstitute.gatk.tools.walkers.genotyper.UnifiedGenotypingEngine.calculateGenotypes(UnifiedGenotypingEngine.java:330)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.regenotypeVC(GenotypeGVCFs.java:327)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:305)
at org.broadinstitute.gatk.tools.walkers.variantutils.GenotypeGVCFs.map(GenotypeGVCFs.java:136)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:98)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:323)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.8-0-ge9d806836):

(How to) Map reads to a reference with alternate contigs like GRCh38

$
0
0

Document is in BETA. It may be incomplete and/or inaccurate. Post suggestions to the Comments section and be sure to read about updates also within the Comments section.


image This exploratory tutorial provides instructions and example data to map short reads to a reference genome with alternate haplotypes. Instructions are suitable for indexing and mapping reads to GRCh38.

► If you are unfamiliar with terms that describe reference genome components, or GRCh38 alternate haplotypes, take a few minutes to study the Dictionary entry Reference Genome Components.

► For an introduction to GRCh38, see Blog#8180.

Specifically, the tutorial uses BWA-MEM to index and map simulated reads for three samples to a mini-reference composed of a GRCh38 chromosome and alternate contig (sections 1–3). We align in an alternate contig aware (alt-aware) manner, which we also call alt-handling. This is the main focus of the tutorial.

The decision to align to a genome with alternate haplotypes has implications for variant calling. We discuss these in section 5 using the callset generated with the optional tutorial steps outlined in section 4. Because we strategically placed a number of SNPs on the sequence used to simulate the reads, in both homologous and divergent regions, we can use the variant calls and their annotations to examine the implications of analysis approaches. To this end, the tutorial fast-forwards through pre-processing and calls variants for a trio of samples that represents the combinations of the two reference haplotypes (the PA and the ALT). This first workflow (tutorial_8017) is suitable for calling variants on the primary assembly but is insufficient for capturing variants on the alternate contigs.

For those who are interested in calling variants on the alternate contigs, we also present a second and a third workflow in section 6. The second workflow (tutorial_8017_toSE) takes the processed BAM from the first workflow, makes some adjustments to the reads to maximize their information, and calls variants on the alternate contig. This approach is suitable for calling on ~75% of the non-HLA alternate contigs or ~92% of loci with non-HLA alternate contigs (see table in section 6). The third workflow (tutorial_8017_postalt) takes the alt-aware alignments from the first workflow and performs a postalt-processing step as well as the same adjustment from the second workflow. Postalt-processing uses the bwa-postalt.js javascript program that Heng Li provides as a companion to BWA. This allows for variant calling on all alternate contigs including HLA alternate contigs.

The tutorial ends by comparing the difference in call qualities from the multiple workflows for the given example data and discusses a few caveats of each approach.

► The three workflows shown in the diagram above are available as WDL scripts in our GATK Tutorials WDL scripts repository.


Jump to a section

  1. Index the reference FASTA for use with BWA-MEM
  2. Include the reference ALT index file
    What happens if I forget the ALT index file?
  3. Align reads with BWA-MEM
    How can I tell if a BAM was aligned with alt-handling?
    What is the pa tag?
  4. (Optional) Add read group information, preprocess to make a clean BAM and call variants
  5. How can I tell whether I should consider an alternate haplotype for a given sample?
    (5.1) Discussion of variant calls for tutorial_8017
  6. My locus includes an alternate haplotype. How can I call variants on alt contigs?
    (6.1) Variant calls for tutorial_8017_toSE
    (6.2) Variant calls for tutorial_8017_postalt
  7. Related resources

Tools involved

  • BWA v0.7.13 or later releases. The tutorial uses v0.7.15.
    Download from here and see Tutorial#2899 for installation instructions.
    The bwa-postalt.js script is within the bwakit folder.

  • Picard tools v2.5.0 or later releases. The tutorial uses v2.5.0.

  • Optional GATK tools. The tutorial uses v3.6.
  • Optional Samtools. The tutorial uses v1.3.1.
  • Optional Gawk, an AWK-like tool that can interpret bitwise SAM flags. The tutorial uses v4.1.3.
  • Optional k8 Javascript shell. The tutorial uses v0.2.3 downloaded from here.

Download example data

Download tutorial_8017.tar.gz, either from the GoogleDrive or from the ftp site. To access the ftp site, leave the password field blank. The data tarball contains the paired FASTQ reads files for three samples. It also contains a mini-reference chr19_chr19_KI270866v1_alt.fasta and corresponding .dict dictionary, .fai index and six BWA indices including the .alt index. The data tarball includes the output files from the workflow that we care most about. These are the aligned SAMs, processed and indexed BAMs and the final multisample VCF callsets from the three presented workflows.

image The mini-reference contains two contigs subset from human GRCh38: chr19 and chr19_KI270866v1_alt. The ALT contig corresponds to a diverged haplotype of chromosome 19. Specifically, it corresponds to chr19:34350807-34392977, which contains the glucose-6-phosphate isomerase or GPI gene. Part of the ALT contig introduces novel sequence that lacks a corresponding region in the primary assembly.

Using instructions in Tutorial#7859, we simulated paired 2x151 reads to derive three different sample reads that when aligned give roughly 35x coverage for the target primary locus. We derived the sequences from either the 43 kbp ALT contig (sample ALTALT), the corresponding 42 kbp region of the primary assembly (sample PAPA) or both (sample PAALT). Before simulating the reads, we introduced four SNPs to each contig sequence in a deliberate manner so that we can call variants.

► Alternatively, you may instead use the example input files and commands with the full GRCh38 reference. Results will be similar with a handful of reads mapping outside of the mini-reference regions.


1. Index the reference FASTA for use with BWA-MEM

Our example chr19_chr19_KI270866v1_alt.fasta reference already has chr19_chr19_KI270866v1_alt.dict dictionary and chr19_chr19_KI270866v1_alt.fasta.fai index files for use with Picard and GATK tools. BWA requires a different set of index files for alignment. The command below creates five of the six index files we need for alignment. The command calls the index function of BWA on the reference FASTA.

bwa index chr19_chr19_KI270866v1_alt.fasta

This gives .pac, .bwt, .ann, .amb and .sa index files that all have the same chr19_chr19_KI270866v1_alt.fasta basename. Tools recognize index files within the same directory by their identical basename. In the case of BWA, it uses the basename preceding the .fasta suffix and searches for the index file, e.g. with .bwt suffix or .64.bwt suffix. Depending on which of the two choices it finds, it looks for the same suffix for the other index files, e.g. .alt or .64.alt. Lack of a matching .alt index file will cause BWA to map reads without alt-handling. More on this next.

Note that the .64. part is an explicit indication that index files were generated with version 0.6 or later of BWA and are the 64-bit indices (as opposed to files generated by earlier versions, which were 32-bit). This .64. signifier can be added automatically by adding -6 to the bwa index command.


back to top


2. Include the reference ALT index file

Be sure to place the tutorial's mini-ALT index file chr19_chr19_KI270866v1_alt.fasta.alt with the other index files. Also, if it does not already match, change the file basename to match. This is the sixth index file we need for alignment. BWA-MEM uses this file to prioritize primary assembly alignments for reads that can map to both the primary assembly and an alternate contig. See BWA documentation for details.

  • As of this writing (August 8, 2016), the SAM format ALT index file for GRCh38 is available only in the x86_64-linux bwakit download as stated in this bwakit README. The hs38DH.fa.alt file is in the resource-GRCh38 folder.
  • In addition to mapped alternate contig records, the ALT index also contains decoy contig records as unmapped SAM records. This is relevant to the postalt-processing we discuss in section 6.2. As such, the postalt-processing in section 6 also requires the ALT index.

For the tutorial, we subset from hs38DH.fa.alt to create a mini-ALT index, chr19_chr19_KI270866v1_alt.fasta.alt. Its contents are shown below.

image

The record aligns the chr19_KI270866v1_alt contig to the chr19 locus starting at position 34,350,807 and uses CIGAR string nomenclature to indicate the pairwise structure. To interpret the CIGAR string, think of the primary assembly as the reference and the ALT contig sequence as the read. For example, the 11307M at the start indicates 11,307 corresponding sequence bases, either matches or mismatches. The 935S at the end indicates a 935 base softclip for the ALT contig sequence that lacks corresponding sequence in the primary assembly. This is a region that we consider highly divergent or novel. Finally, notice the NM tag that notes the edit distance to the reference.

☞ What happens if I forget the ALT index file?

If you omit the ALT index file from the reference, or if its naming structure mismatches the other indexes, then your alignments will be equivalent to the results you would obtain if you run BWA-MEM with the -j option. The next section gives an example of what this looks like.


back to top


3. Align reads with BWA-MEM

The command below uses an alt-aware version of BWA and maps reads using BWA's maximal exact match (MEM) option. Because the ALT index file is present, the tool prioritizes mapping to the primary assembly over ALT contigs. In the command, the tutorial's chr19_chr19_KI270866v1_alt.fasta serves as reference; one FASTQ holds the forward reads and the other holds the reverse reads.

bwa mem chr19_chr19_KI270866v1_alt.fasta 8017_read1.fq 8017_read2.fq > 8017_bwamem.sam

The resulting file 8017_bwamem.sam contains aligned read records.

  • BWA preferentially maps to the primary assembly any reads that can align equally well to the primary assembly or the ALT contigs as well as any reads that it can reasonably align to the primary assembly even if it aligns better to an ALT contig. Preference is given by the primary alignment record status, i.e. not secondary and not supplementary. BWA takes the reads that it cannot map to the primary assembly and attempts to map them to the alternate contigs. If a read can map to an alternate contig, then it is mapped to the alternate contig as a primary alignment. For those reads that can map to both and align better to the ALT contig, the tool flags the ALT contig alignment record as supplementary (0x800). This is what we call alt-aware mapping or alt-handling.
  • Adding the -j option to the command disables the alt-handling. Reads that can map multiply are given low or zero MAPQ scores.

image

☞ How can I tell if a BAM was aligned with alt-handling?

There are two approaches to this question.

First, you can view the alignments on IGV and compare primary assembly loci with their alternate contigs. The IGV screenshots to the right show how BWA maps reads with (top) or without (bottom) alt-handling.

Second, you can check the alignment SAM. Of two tags that indicate alt-aware alignment, one will persist after preprocessing only if the sample has reads that can map to alternate contigs. The first tag, the AH tag, is in the BAM header section of the alignment file, and is absent after any merging step, e.g. merging with MergeBamAlignment. The second tag, the pa tag, is present for reads that the aligner alt-handles. If a sample does not contain any reads that map equally or preferentially to alternate contigs, then this tag may be absent in a BAM even if the alignments were mapped in an alt-aware manner.

Here are three headers for comparison where only one indicates alt-aware alignment.

File header for alt-aware alignment. We use this type of alignment in the tutorial.
Each alternate contig's @SQ line in the header will have an AH:* tag to indicate alternate contig handling for that contig. This marking is based on the alternate contig being listed in the .alt index file and alt-aware alignment.
image

File header for -j alignment (alt-handling disabled) for example purposes. We do not perform this type of alignment in the tutorial.
Notice the absence of any special tags in the header.
image

File header for alt-aware alignment after merging with MergeBamAlignment. We use this step in the next section.
Again, notice the absence of any special tags in the header.
image

☞ What is the pa tag?

For BWA v0.7.15, but not v0.7.13, ALT loci alignment records that can align to both the primary assembly and alternate contig(s) will have a pa tag on the primary assembly alignment. For example, read chr19_KI270866v1_alt_4hetvars_26518_27047_0:0:0_0:0:0_931 of the ALTALT sample has five alignment records only three of which have the pa tag as shown below.

image

A brief description of each of the five alignments, in order:

  1. First in pair, primary alignment on the primary assembly; AS=146, pa=0.967
  2. First in pair, supplementary alignment on the alternate contig; AS=151
  3. Second in pair, primary alignment on the primary assembly; AS=120; pa=0.795
  4. Second in pair, supplementary alignment on the primary assembly; AS=54; pa=0.358
  5. Second in pair, supplementary alignment on the alternate contig; AS=151

The pa tag measures how much better a read aligns to its best alternate contig alignment versus its primary assembly (pa) alignment. Specifically, it is the ratio of the primary assembly alignment score over the highest alternate contig alignment score. In our example we have primary assembly alignment scores of 146, 120 and 54 and alternate contig alignment scores of 151 and again 151. This gives us three different pa scores that tag the primary assembly alignments: 146/151=0.967, 120/151=0.795 and 54/151=0.358.

In our tutorial's workflow, MergeBamAlignment may either change an alignment's pa score or add a previously unassigned pa score to an alignment. The result of this is summarized as follows for the same alignments.

  1. pa=0.967 --MergeBamAlignment--> same
  2. none --MergeBamAlignment--> assigns pa=0.967
  3. pa=0.795 --MergeBamAlignment--> same
  4. pa=0.358 --MergeBamAlignment--> replaces with pa=0.795
  5. none --MergeBamAlignment--> assigns pa=0.795

If you want to retain the BWA-assigned pa scores, then add the following options to the workflow commands in section 4.

  • For RevertSam, add ATTRIBUTE_TO_CLEAR=pa.
  • For MergeBamAlignment, add ATTRIBUTES_TO_RETAIN=pa.

In our sample set, after BWA-MEM alignment ALTALT has 1412 pa-tagged alignment records, PAALT has 805 pa-tagged alignment records and PAPA has zero pa-tagged records.


back to top


4. Add read group information, preprocess to make a clean BAM and call variants

The initial alignment file is missing read group information. One way to add that information, which we use in production, is to use MergeBamAlignment. MergeBamAlignment adds back read group information contained in an unaligned BAM and adjusts meta information to produce a clean BAM ready for pre-processing (see Tutorial#6483 for details on our use of MergeBamAlignment). Given the focus here is to showcase BWA-MEM's alt-handling, we refrain from going into the details of all this additional processing. They follow, with some variation, the PairedEndSingleSampleWf pipeline detailed here.

Remember these are simulated reads with simulated base qualities. We simulated the reads in a manner that only introduces the planned mismatches, without any errors. Coverage is good at roughly 35x. All of the base qualities for all of the reads are at I, which is, according to this page and this site, an excellent base quality score equivalent to a Sanger Phred+33 score of 40. We can therefore skip base quality score recalibration (BQSR) since the reads are simulated and the dataset is not large enough for recalibration anyway.

Here are the commands to obtain a final multisample variant callset. The commands are given for one of the samples. Process each of the three samples independently in the same manner [4.1–4.6] until the last GenotypeGVCFs command [4.7].

[4.1] Create unmapped uBAM

java -jar picard.jar RevertSam \
    I=altalt_bwamem.sam O=altalt_u.bam \
    ATTRIBUTE_TO_CLEAR=XS ATTRIBUTE_TO_CLEAR=XA

[4.2] Add read group information to uBAM

java -jar picard.jar AddOrReplaceReadGroups \
    I=altalt_u.bam O=altalt_rg.bam \
    RGID=altalt RGSM=altalt RGLB=wgsim RGPU=shlee RGPL=illumina

[4.3] Merge uBAM with aligned BAM

java -jar picard.jar MergeBamAlignment \
    ALIGNED=altalt_bwamem.sam UNMAPPED=altalt_rg.bam O=altalt_m.bam \
    R=chr19_chr19_KI270866v1_alt.fasta \
    SORT_ORDER=unsorted CLIP_ADAPTERS=false \
    ADD_MATE_CIGAR=true MAX_INSERTIONS_OR_DELETIONS=-1 \
    PRIMARY_ALIGNMENT_STRATEGY=MostDistant \
    UNMAP_CONTAMINANT_READS=false \
    ATTRIBUTES_TO_RETAIN=XS ATTRIBUTES_TO_RETAIN=XA

[4.4] Flag duplicate reads

java -jar picard.jar MarkDuplicates \
    INPUT=altalt_m.bam OUTPUT=altalt_md.bam METRICS_FILE=altalt_md.bam.txt \
    OPTICAL_DUPLICATE_PIXEL_DISTANCE=2500 ASSUME_SORT_ORDER=queryname 

[4.5] Coordinate sort, fix NM and UQ tags and index for clean BAM
As of Picard v2.7.0, released October 17, 2016, SetNmAndUqTags is no longer available. Use SetNmMdAndUqTags instead.

set -o pipefail
java -jar picard.jar SortSam \
    INPUT=altalt_md.bam OUTPUT=/dev/stdout SORT_ORDER=coordinate | \
    java -jar $PICARD SetNmAndUqTags \
    INPUT=/dev/stdin OUTPUT=altalt_snaut.bam \
    CREATE_INDEX=true R=chr19_chr19_KI270866v1_alt.fasta

[4.6] Call SNP and indel variants in emit reference confidence (ERC) mode per sample using HaplotypeCaller

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller \
    -R chr19_chr19_KI270866v1_alt.fasta \
    -o altalt.g.vcf -I altalt_snaut.bam \
    -ERC GVCF --max_alternate_alleles 3 --read_filter OverclippedRead \
    --emitDroppedReads -bamout altalt_hc.bam

[4.7] Call genotypes on three samples

java -jar GenomeAnalysisTK.jar -T GenotypeGVCFs \
    -R chr19_chr19_KI270866v1_alt.fasta -o multisample.vcf \
    --variant altalt.g.vcf --variant altpa.g.vcf --variant papa.g.vcf 

The altalt_snaut.bam, HaplotypeCaller's altalt_hc.bam and the multisample multisample.vcf are ready for viewing on IGV.

Before getting into the results in the next section, we have minor comments on two filtering options.

In our tutorial workflows, we turn off MergeBamAlignment's UNMAP_CONTAMINANT_READS option. If set to true, 68 reads become unmapped for PAPA and 40 reads become unmapped for PAALT. These unmapped reads are those reads caught by the UNMAP_CONTAMINANT_READS filter and their mates. MergeBamAlignment defines contaminant reads as those alignments that are overclipped, i.e. that are softclipped on both ends, and that align with less than 32 bases. Changing the MIN_UNCLIPPED_BASES option from the default of 32 to 22 and 23 restores all of these reads for PAPA and PAALT, respectively. Contaminants are obviously absent for these simulated reads. And so we set UNMAP_CONTAMINANT_READS to false to disable this filtering.

HaplotypeCaller's --read_filter OverclippedRead option similarly looks for both-end-softclipped alignments, then filters reads aligning with less than 30 bases. The difference is that HaplotypeCaller only excludes the overclipped alignments from its calling and does not remove mapping information nor does it act on the mate of the filtered alignment. Thus, we keep this read filter for the first workflow. However, for the second and third workflows in section 6, tutorial_8017_toSE and tutorial_8017_postalt, we omit the --read_filter Overclipped option from the HaplotypeCaller command. We also omit the --max_alternate_alleles 3 option for simplicity.


back to top


5. How can I tell whether I should consider an alternate haplotype?

image We consider this question only for our GPI locus, a locus we know has an alternate contig in the reference. Here we use the term locus in its biological sense to refer to a contiguous genomic region of interest. The three samples give the alignment and coverage profiles shown on the right.

What is immediately apparent from the IGV screenshot is that the scenarios that include the alternate haplotype give a distinct pattern of variant sites to the primary assembly much like a fingerprint. These variants are predominantly heterozygous or homozygous. Looking closely at the 3' region of the locus, we see some alignment coverage anomalies that also show a distinct pattern. The coverage in some of the highly diverged region in the primary assembly drops while in others it increases. If we look at the origin of simulated reads in one of the excess coverage regions, we see that they are from two different regions of the alternate contig that suggests duplicated sequence segments within the alternate locus.

The variation pattern and coverage anomalies on the primary locus suggest an alternate haplotype may be present for the locus. We can then confirm the presence of aligned reads, both supplementary and primary, on the alternate locus. Furthermore, if we count the alignment records for each region, e.g. using samtools idxstats, we see the following metrics.

                        ALT/ALT     PA/ALT     PA/PA   
chr19                     10005      10006     10000     
chr19_KI270866v1_alt       1407        799         0      

The number of alignments on the alternate locus increases proportionately with alternate contig dosage. All of these factors together suggest that the sample presents an alternate haplotype.

5.1 Discussion of variant calls for tutorial_8017

The three-sample variant callset gives 54 sites on the primary locus and two additional on the alternate locus for 56 variant sites. All of the eight SNP alleles we introduced are called, with six called on the primary assembly and two called on the alternate contig. Of the 15 expected genotype calls, four are incorrect. Namely, four PAALT calls that ought to be heterozygous are called homozygous variant. These are two each on the primary assembly and on the alternate contig in the region that is highly divergent.

► Our production pipelines use genomic intervals lists that exclude GRCh38 alternate contigs from variant calling. That is, variant calling is performed only for contigs of the primary assembly. This calling on even just the primary assembly of GRCh38 brings improvements to analysis results over previous assemblies. For example, if we align and call variants for our simulated reads on GRCh37, we call 50 variant sites with identical QUAL scores to the equivalent calls in our GRCh38 callset. However, this GRCh37 callset is missing six variant calls compared to the GRCh38 callset for the 42 kb locus: the two variant sites on the alternate contig and four variant sites on the primary assembly.

Consider the example variants on the primary locus. The variant calls from the primary assembly include 32 variant sites that are strictly homozygous variant in ALTALT and heterozygous variant in PAALT. The callset represents only those reads from the ALT that can be mapped to the primary assembly.

In contrast, the two variants in regions whose reads can only map to the alternate contig are absent from the primary assembly callset. For this simulated dataset, the primary alignments present on the alternate contig provide enough supporting reads that allow HaplotypeCaller to call the two variants. However, these variant calls have lower-quality annotation metrics than for those simulated in an equal manner on the primary assembly. We will get into why this is in section 6.

Additionally, for our PAALT sample that is heterozygous for an alternate haplotype, the genotype calls in the highly divergent regions are inaccurate. These are called homozygous variant on the primary assembly and on the alternate contig when in fact they are heterozygous variant. These calls have lower genotype scores GQ as well as lower allele depth AD and coverage DP. The table below shows the variant calls for the introduced SNP sites. In blue are the genotype calls that should be heterozygous variant but are instead called homozygous variant.
image

Here is a command to select out the intentional variant sites that uses SelectVariants:

java -jar GenomeAnalysisTK.jar -T SelectVariants \
    -R chr19_chr19_KI270866v1_alt.fasta \
    -V multisample.vcf -o multisample_selectvariants.vcf \
    -L chr19:34,383,500 -L chr19:34,389,485 -L chr19:34,391,800 -L chr19:34,392,600 \
    -L chr19_KI270866v1_alt:32,700 -L chr19_KI270866v1_alt:38,700 \
    -L chr19_KI270866v1_alt:41,700 -L chr19_KI270866v1_alt:42,700 \
    -L chr19:34,383,486 -L chr19_KI270866v1_alt:32,714 


back to top


6. My locus includes an alternate haplotype. How can I call variants on alt contigs?

If you want to call variants on alternate contigs, consider additional data processing that overcome the following problems.

  • Loss of alignments from filtering of overclipped reads.
  • HaplotypeCaller's filtering of alignments whose mates map to another contig. Alt-handling produces many of these types of reads on the alternate contigs.
  • Zero MAPQ scores for alignments that map to two or more alternate contigs. HaplotypeCaller excludes these types of reads from contributing to evidence for variation.

Let us talk about these in more detail.

Ideally, if we are interested in alternate haplotypes, then we would have ensured we were using the most up-to-date analysis reference genome sequence with the latest patch fixes. Also, whatever approach we take to align and preprocess alignments, if we filter any reads as putative contaminants, e.g. with MergeBamAlignment's option to unmap cross-species contamination, then at this point we would want to fish back into the unmapped reads pool and pull out those reads. Specifically, these would have an SA tag indicating mapping to the alternate contig of interest and an FT tag indicating the reason for unmapping was because MergeBamAlignment's UNMAP_CONTAMINANT_READS option identified them as cross-species contamination. Similarly, we want to make sure not to include HaplotypeCaller's --read_filter OverclippedRead option that we use in the first workflow.

image As section 5.1 shows, variant calls on the alternate contig are of low quality--they have roughly an order of magnitude lower QUAL scores than what should be equivalent variant calls on the primary assembly.

For this exploratory tutorial, we are interested in calling the introduced SNPs with equivalent annotation metrics. Whether they are called on the primary assembly or the alternate contig and whether they are called homozygous variant or heterozygous--let's say these are less important, especially given pinning certain variants from highly homologous regions to one of the loci is nigh impossible with our short reads. To this end, we will use the second workflow shown in the workflows diagram. However, because this solution is limited, we present a third workflow as well.

► We present these workflows solely for exploratory purposes. They do not represent any production workflows.

Tutorial_8017_toSE uses the processed BAM from our first workflow and allows for calling on singular alternate contigs. That is, the workflow is suitable for calling on alternate contigs of loci with only a single alternate contig like our GPI locus. Tutorial_8017_postalt uses the aligned SAM from the first workflow before processing, and requires separate processing before calling. This third workflow allows for calling on all alternate contigs, even on HLA loci that have numerous contigs per primary locus. However, the callset will not be parsimonious. That is, each alternate contig will greedily represent alignments and it is possible the same variant is called for all the alternate loci for a given primary locus as well as on the primary locus. It is up to the analyst to figure out what to do with the resulting calls.

image The reason for the divide in these two workflows is in the way BWA assigns mapping quality scores (MAPQ) to multimapping reads. Postalt-processing becomes necessary for loci with two or more alternate contigs because the shared alignments between the primary locus and alternate loci will have zero MAPQ scores. Postalt-processing gives non-zero MAPQ scores to the alignment records. The table presents the frequencies of GRCh38 non-HLA alternate contigs per primary locus. It appears that ~75% of non-HLA alternate contigs are singular to ~92% of primary loci with non-HLA alternate contigs. In terms of bases on the primary assembly, of the ~75 megabases that have alternate contigs, ~64 megabases (85%) have singular non-HLA alternate contigs and ~11 megabases (15%) have multiple non-HLA alternate contigs per locus. Our tutorial's example locus falls under this majority.

In both alt-aware mapping and postalt-processing, alternate contig alignments have a predominance of mates that map back to the primary assembly. HaplotypeCaller, for good reason, filters reads whose mates map to a different contig. However, we know that GRCh38 artificially represents alternate haplotypes as separate contigs and BWA-MEM intentionally maps these mates back to the primary locus. For comparable calls on alternate contigs, we need to include these alignments in calling. To this end, we have devised a temporary workaround.

6.1 Variant calls for tutorial_8017_toSE

Here we are only aiming for equivalent calls with similar annotation values for the two variants that are called on the alternate contig. For the solution that we will outline, here are the results.

image

Including the mate-mapped-to-other-contig alignments bolsters the variant call qualities for the two SNPs HaplotypeCaller calls on the alternate locus. We see the AD allele depths much improved for ALTALT and PAALT. Corresponding to the increase in reads, the GQ genotype quality and the QUAL score (highlighted in red) indicate higher qualities. For example, the QUAL scores increase from 332 and 289 to 2166 and 1764, respectively. We also see that one of the genotype calls changes. For sample ALTALT, we see a previous no call is now a homozygous reference call (highlighted in blue). This hom-ref call is further from the truth than not having a call as the ALTALT sample should not have coverage for this region in the primary assembly.

For our example data, tutorial_8017's callset subset for the primary assembly and tutorial_8017_toSE's callset subset for the alternate contigs together appear to make for a better callset.

What solution did we apply? As the workflow's name toSE implies, this approach converts paired reads to single end reads. Specifically, this approach takes the processed and coordinate-sorted BAM from the first workflow and removes the 0x1 paired flag from the alignments. Removing the 0x1 flag from the reads allows HaplotypeCaller to consider alignments whose mates map to a different contig. We accomplish this using a modified script of that presented in Biostars post https://www.biostars.org/p/106668/, indexing with Samtools and then calling with HaplotypeCaller as follows. Note this workaround creates an invalid BAM according to ValidateSamFile. Also, another caveat is that because HaplotypeCaller uses softclipped sequences, any overlapping regions of read pairs will count twice towards variation instead of once. Thus, this step may lead to overconfident calls in such regions.

Remove the 0x1 bitwise flag from alignments

samtools view -h altalt_snaut.bam | gawk '{printf "%s\t", $1; if(and($2,0x1))
{t=$2-0x1}else{t=$2}; printf "%s\t" , t; for (i=3; i<NF; i++){printf "%s\t", $i} ; 
printf "%s\n",$NF}'| samtools view -Sb - > altalt_se.bam

Index the resulting BAM

samtools index altalt_se.bam

Call variants in -ERC GVCF mode with HaplotypeCaller for each sample

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller \
    -R chr19_chr19_KI270866v1_alt.fasta \
    -I altalt_se.bam -o altalt_hc.g.vcf \
    -ERC GVCF --emitDroppedReads -bamout altalt_hc.bam

Finally, use GenotypeGVCFs as shown in section 4's command [4.7] for a multisample variant callset. Tutorial_8017_toSE calls 68 variant sites--66 on the primary assembly and two on the alternate contig.

6.2 Variant calls for tutorial_8017_postalt

BWA's postalt-processing requires the query-grouped output of BWA-MEM. Piping an alignment step with postalt-processing is possible. However, to be able to compare variant calls from an identical alignment, we present the postalt-processing as an add-on workflow that takes the alignment from the first workflow.

The command uses the bwa-postalt.js script, which we run through k8, a Javascript execution shell. It then lists the ALT index, the aligned SAM altalt.sam and names the resulting file > altalt_postalt.sam.

k8 bwa-postalt.js \
    chr19_chr19_KI270866v1_alt.fasta.alt \
    altalt.sam > altalt_postalt.sam

image The resulting postalt-processed SAM, altalt_postalt.sam, undergoes the same processing as the first workflow (commands 4.1 through 4.7) except that (i) we omit --max_alternate_alleles 3 and --read_filter OverclippedRead options for the HaplotypeCaller command like we did in section 6.1 and (ii) we perform the 0x1 flag removal step from section 6.1.

The effect of this postalt-processing is immediately apparent in the IGV screenshots. Previously empty regions are now filled with alignments. Look closely in the highly divergent region of the primary locus. Do you notice a change, albeit subtle, before and after postalt-processing for samples ALTALT and PAALT?

These alignments give the calls below for our SNP sites of interest. Here, notice calls are made for more sites--on the equivalent site if present in addition to the design site (highlighted in the first two columns). For the three pairs of sites that can be called on either the primary locus or alternate contig, the variant site QUALs, the INFO field annotation metrics and the sample level annotation values are identical for each pair.

image

Postalt-processing lowers the MAPQ of primary locus alignments in the highly divergent region that map better to the alt locus. You can see this as a subtle change in the IGV screenshot. After postalt-processing we see an increase in white zero MAPQ reads in the highly divergent region of the primary locus for ALTALT and PAALT. For ALTALT, this effectively cleans up the variant calls in this region at chr19:34,391,800 and chr19:34,392,600. Previously for ALTALT, these calls contained some reads: 4 and 25 for the first workflow and 0 and 28 for the second workflow. After postalt-processing, no reads are considered in this region giving us ./.:0,0:0:.:0,0,0 calls for both sites.

What we omit from examination are the effects of postalt-processing on decoy contig alignments. Namely, if an alignment on the primary assembly aligns better on a decoy contig, then postalt-processing discounts the alignment on the primary assembly by assigning it a zero MAPQ score.

To wrap up, here are the number of variant sites called for the three workflows. As you can see, this last workflow calls the most variants at 95 variant sites, with 62 on the primary assembly and 33 on the alternate contig.

Workflow                total    on primary assembly    on alternate contig
tutorial_8017           56       54                      2
tutorial_8017_toSE      68       66                      2
tutorial_8017_postalt   95       62                     33


back to top


7. Related resources

  • For WDL scripts of the workflows represented in this tutorial, see the GATK WDL scripts repository.
  • To revert an aligned BAM to unaligned BAM, see Section B of Tutorial#6484.
  • To simulate reads from a reference contig, see Tutorial#7859.
  • Dictionary entry Reference Genome Components reviews terminology that describe reference genome components.
  • The GATK resource bundle provides an analysis set GRCh38 reference FASTA as well as several other related resource files.
  • As of this writing (August 8, 2016), the SAM format ALT index file for GRCh38 is available only in the x86_64-linux bwakit download as stated in this bwakit README. The hs38DH.fa.alt file is in the resource-GRCh38 folder. Rename this file's basename to match that of the corresponding reference FASTA.
  • For more details on MergeBamAlignment features, see Section 3C of Tutorial#6483.
  • For details on the PairedEndSingleSampleWorkflow that uses GRCh38, see here.
  • See here for VCF specifications.

back to top


Viewing all 12345 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>