Hi there,
I'm having some issues getting Picard to work. I want to run the CollectRnaSeqMetrics function, but it requires the input of a refFlat file. I'm having some trouble getting this refFlat file to work.
So far what I've done is convert a .gff3 file to .genePred using UCSC's gff3toGenePred tool. Whilst this has seemingly worked, genePred format has more fields then are required for refFlat. I have thus removed/rearranged a number of columns so that the structure of the .genePred file mirrors that of refFlat. This is based on the following guide that details the refFlat structure/format:
The schema for the refFlat file/table is:
+------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| geneName | varchar(255) | NO | MUL | NULL | |
| name | varchar(255) | NO | MUL | NULL | |
| chrom | varchar(255) | NO | MUL | NULL | |
| strand | char(1) | NO | | NULL | |
| txStart | int(10) unsigned | NO | | NULL | |
| txEnd | int(10) unsigned | NO | | NULL | |
| cdsStart | int(10) unsigned | NO | | NULL | |
| cdsEnd | int(10) unsigned | NO | | NULL | |
| exonCount | int(10) unsigned | NO | | NULL | |
| exonStarts | longblob | NO | | NULL | |
| exonEnds | longblob | NO | | NULL | |
+------------+------------------+------+-----+---------+-------+
My data frame is set out like the above schema, with 11 columns in total corresponding to geneName, name, chrom, strand, txStart, txEnd, cdsStart, cdsEnd, exonCount, exonStarts, exonEnds. Still I get the following error message when running picards CollectRnaSeqMetrics function:
Thu Sep 14 11:15:50 GMT 2017] picard.analysis.CollectRnaSeqMetrics REF_FLAT=/users/work/jake/Picard/All_morph_no_trimmo_0.5TPM_reorder.refFlat STRAND_SPECIFICITY=NONE CHART_OUTPUT=B11_oy_rCc2_clean_graph.pdf INPUT=/users/work/jake/STAR_bam_output/B11_oy_rCc2_clean_Aligned.out.sam OUTPUT=/users/work/jake/Picard MINIMUM_LENGTH=500 RRNA_FRAGMENT_PERCENTAGE=0.8 METRIC_ACCUMULATION_LEVEL=[ALL_READS] ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Thu Sep 14 11:15:50 GMT 2017] Executing as jake@compute-0-22 on Linux 3.10.0-327.36.3.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11; Picard version: 2.9.0-1-gf5b9f50-SNAPSHOT
WARNING 2017-09-14 11:15:51 SinglePassSamProgram File reports sort order 'unsorted', assuming it's coordinate sorted anyway.
[Thu Sep 14 11:15:51 GMT 2017] picard.analysis.CollectRnaSeqMetrics done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2024275968
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.annotation.AnnotationException: Wrong number of fields in refFlat file /users/work/jake/Picard/All_morph_no_trimmo_0.5TPM_reorder.refFlat at line 1
at picard.annotation.RefFlatReader.load(RefFlatReader.java:80)
at picard.annotation.RefFlatReader.load(RefFlatReader.java:66)
at picard.annotation.GeneAnnotationReader.loadRefFlat(GeneAnnotationReader.java:37)
at picard.analysis.CollectRnaSeqMetrics.setup(CollectRnaSeqMetrics.java:142)
at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:122)
at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:77)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
Can you possibly suggest what I am doing wrong? I'm at a bit of a loss now as to what I could do. For reference my refFlat file looks like the following:
Gene.92775::Transcript_99997::g.92775::m.92775 Gene.92775::Transcript_99997::g.92775 Transcript_99997 - 0 1827 1008 1716 1 0, 1827,
Gene.92773::Transcript_99996::g.92773::m.92773 Gene.92773::Transcript_99996::g.92773 Transcript_99996 + 0 525 0 525 1 0, 525,
Gene.92771::Transcript_99994::g.92771::m.92771 Gene.92771::Transcript_99994::g.92771 Transcript_99994 - 0 1170 0 1170 1 0, 1170,
Gene.92769::Transcript_99993::g.92769::m.92769 Gene.92769::Transcript_99993::g.92769 Transcript_99993 - 0 1206 0 1206 1 0, 1206,