On this page, you'll learn how you can check the results. You'll also learn how to read the result files, where to look for the alignment results (bam files), and the methods to check whether jobs were succeeded or not.
Links inside this page:
How to Read The Result File (Fisher's exact test)
How to Read The Result File (Empirical Baysian mutation Calling)
How to Read The Summary File
How to Read The Log File
How to Confirm The Result of the Submitted Job
Bring sum_${sample name}_${date you did data analysis}_${sampletype}.exome.result.txt to your local PC and open it up with Excel.
The columns are exaplined below:
For the columns that are not explained here, please refer to ANNOVAR.
# exome.result.txt Chr Start End Ref Obs bases_tumor bases_normal misRate_tumor strandRatio_tumor misRate_normal strandRatio_normal p-value chr1 1000 1000 G T 0,0,15,14 0,0,12,1 0.482758 0.7 0.076923 0.5 1.829449 chr2 2000 2000 - TA 80,9 73,2 0.1125 0 0.023797 0.5 1.229181 chr3 3000 3002 ACA - 13,8 12,0 0.615384 0.5 0 --- 2.783149
Chr Start End
mutation candidate positions
Ref
Reference base against the mutation candidate. Hyphenated for insertions.
Obs
Base sequences of the mutation candidate. Hyphenated for deletions.
bases_tumor, bases_normal
Each base(A,C,G,T) counts for SNVs. Depth and indel reads for indels.
misRate_tumor, misRate_normal
The mismatch rate computed.
SNVs: #Obs(counts of reads obs bases) / Depth (total of A, C, G, and T counts).
From the Table above, 1st line, misRate_tumor 14/29 = 0.482758
From the Table above, 1st line, misRate_normal 1/13 = 0.076923
Indels: indel reads/Depth.
From the Table above, 2nd line, misRate_tumor 9/80 = 0.1125
From the Table above, 2nd line, misRate_normal 2/73 = 0.027397
strandRatio_tumor, strandRatio_normal
Percentage of mismatched reads that are mapped along the + strand. If all the mismatched reads are aligned to the + strand direction, the value is 1. If mapped all to the - strand, the value is 0. Equally mapped to the + and - strands, 0.5.(If this value is close to 0 or 1, we suspect possible sequencing errors).
p-value
-log10(p-value), For R, you should write something like:
SNV -log10(fisher.test(matrix(as.integer(c(15,12,14,1)),2,2))$p.value);
Indel -log10(fisher.test(matrix(as.integer(c(71,9,71,2)),2,2))$p.value);
Standard filters
1000g: filter out those 1000Genome registered mutations.
dbSNP: filter out those dbSNP registered mutations.
misRate_normal: filter out candidates that have high mutation rates in the normal sample.
Bring ${TAG}.exome.result.txt to your local PC and open it up with Excel.
The columns are exaplined below:
For the columns that are not explained here, please refer to ANNOVAR.
# exome.result.txt (SAMPLE) Chr Start End Ref Obs bases_tumor bases_normal misRate_tumor strandRatio_tumor depth_tumor variantNum_tumor misRate_normal strandRatio_normal depth_normal variantNum_normal p-value p-value (+strand) p-value (-strand) p-value(Fisher) alpha (+starnd) beta (+strand) alpha (-strand) beta(-strand) chr1 1000 1000 G T 14,10,15,4 7,1,7,1 0.482758 0.7 29 14 0.142857 0.5 14 2 2.123456789 0.123456789 2.123456789 1.349564 0.100001462 3.982312349 2.545231456 201.1434566 chr2 2000 2000 - TA 20,0,69,9 30,1,43,1 0.101123 0 89 9 0.027397 0.5 73 2 3.123456789 0 4.123456789 0.9459894 0.1 47.82511302 0.1 22.12343452 chr3 3000 3002 ACA - 11,4,10,4 5,0,7,0 0.380952 0.5 21 8 0 --- 12 0 2.123456789 0.123456789 2.123456789 1.182557 0.1 1.123456785 0.1 8.125789943
Chr Start End
the position of the candidate mutation
Ref
the reference base for that position ('-' for insertions).
Obs
the alternated sequence for the mutation candidate ('-' for deletions).
bases_tumor, bases_normal
(sequencing depth for positive strand: the number of variant reads for positive strand: sequencing depth for negative strand: the number of variant read for negative strand) for both tumor and normal samples.
misRate_tumor, misRate_normal
the mismatch rates computed in the tumor and normal samples
strandRatio_tumor, strandRatio_normal
the ratio of variant reads aligned to positive strand for the tumor and normal samples.
depth_tumor depth_normal
sequencing depths for that position for the tumor and normal samples.
variantNum_tumor, variantNum_normal
the number of supporting variant read in the tumor and normal samples.
p-value
the minus logarithm of p-value of the EBCall. This is a combined value from the two p-values caluculated in positive and negative strands
Please see the following paper the calculation of p-value.
An empirical Bayesian framework for mutation detection from cancer genome sequencing data.
Nucleic Acids Research (Published: February 10, 2013)
p-value (+strand), p-value (-strand)
the minus logarithm of p-value of the EBCall for positive and negative strand, respectively
p-value(Fisher)
the minus logarithm of the p-value by Fisher's exact test
alpha (+starnd),beta (+strand),alpha (-strand),beta(-strand)
the estimated parameter values of Beta-Binominal sequencing error model for that variant.
Standard filters
misRate_tumor: filter out candidates that have low mutation rates in the tumor sample.
misRate_normal: filter out candidates that have high mutation rates in the normal sample.
p-value, p-value(Fisher): filter out candidates that have lower p-value.
The summary file is obtained in Command 2, ALIGNMENT & GENERATE SUMMARY TABLE. There you should find the alignment rate and coverate infomation.
Bring your exome/data/output/${sample name}/${RUN date}/summary/ directory to your local PC. Let's go over the results you should at least look at:
List of result files:# Output Dir exome/data/output/${Sample name}/${RUN date}/summary/${type} # Result Files AlignmentSummaryMetrics.txt ga.metrics GcBiasDetailMetrics.pdf GcBiasDetailMetrics.txt GcBiasSummaryMetrics.txt InsertSizeMetrics.pdf InsertSizeMetrics.txt MeanQualityByCycle.pdf MeanQualityByCycle.txt QualityScoreDistribution.pdf QualityScoreDistribution.txtAlignmentSummaryMetrics.txt
Let's note all the logs are stored under the same output directory, "exome/log."
In the Main Shell (map_bwa.sh, realigh_gatk.sh, fisher_test.sh) Log, you see qsub'ed jobs' return values.Normal log file example . . .
['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh'] ['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh'] ['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh'] ['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh'] Cleaning up started. Mon Apr 9 22:32:54 2012 Cleaning up in progress. Waiting for submitted jobs to be finished. job id : 33394 failed =0 exit_status=0 job id : 33393 failed =0 exit_status=0 job id : 33392 failed =0 exit_status=0 job id : 33391 failed =0 exit_status=0 Cleaning up finished. Mon Apr 9 22:37:20 2012 # comments # In this job, there were 4 qsub'ed rm.sh jobs. # All the jobs were finished with "failed=0 exit_status=0," # we see the Main shell is also finished with success.
Error log file example . . .
['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh'] ['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh'] ['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh'] ['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh'] Cleaning up started. Mon Apr 9 22:32:54 2012 Cleaning up in progress. Waiting for submitted jobs to be finished. job id : 33394 failed =0 exit_status=0 job id : 33393 failed =1 exit_status=125 Cleaning up finished. Mon Apr 9 22:37:20 2012 # comment # In this job, there were 4 qsub'ed rm.sh jobs. # One of the jobs was finished with "failed=1 exit_status=125." # If one of the "child" jobs fails, The Main shell immediately stops.
Upon error, we need to look at the qsub log in the Main Shell.
# output dir exome/log/${sample}_${RUN date} # std output log rm.sh.o33393 # err log rm.sh.e33393 # Sometimes your jobs fail due to the Supercomputer's system huccup. # Resubmission often works.
List of Main Shell log filenames
# bwa alignment logs map_bwa.sh.o0000000 map_bwa.sh.e0000000 # realign job logs realign_gatk.sh.o0000000 realign_gatk.sh.e0000000 # your data analysis logs fisher_test.sh.o0000000 fisher_test.sh.e0000000 # {o,e}0000000 should be replaced with your job ID.Note that copy number data analysis job doesn't leave logs corresponding above. Error check should be done by following the next instruction, How to Confirm The Result of The Submitted Job.
You can check whether your jobs were succeeded or not on the Supercomputer.
Please use the qacct2 command availble on the HGC Supercomputer system.
# Usage: qacct2 -o $USER -b yyyyMMddhhmm -e yyyyMMddhhmm -l -f # comments # -o your username, echo $USER # -b The earliest end time for jobs to be summarized, in the format [yyyyMMddhhmm] # -e The latest end time for jobs to be, in the format [yyyyMMddhhmm] # -l listfy the result. # -f display failed in SGE or abnoraml exit status job only. # Example. User, genomon searches _errored_ jobs between 2012/07/01/ 00:00 and 2012/07/02/ 00:00 qacct2 -o genomon -b 201207010000 -e 201207020000 -l -f | owner| jobid| task|slot| pe_id| granted_pe|ext|fail| qname| host|jobname | end_time| clock| mmem|rmem| r_q| r_cpu|qdel|fail_txt | Rq| Rm|Ropt | | genomon|1938906|undefined| 1| NONE| NONE| 1| 0| mjobs.q|c369i|split.sh |20120701-11:20| 266| 3.0G| 4G| NONE| NONE|NONE|N/A | mjobs.q| 4G|-l s_vmem=4G,mem_req=4 | | genomon|1938901|undefined| 1| NONE| NONE|152| 100| mjobs.q|c628i|split.sh |20120701-10:45| 4449| 4.0G| 4G| NONE| NONE|NONE|assumedly | mjobs.q| 8G|-l s_vmem=8G,mem_req=8 | | genomon|1938928|undefined| 1| NONE| NONE| 1| 0| mjobs.q|c722i|split.sh |20120701-09:52| 0| 0.0G| 2G| NONE| NONE|NONE|N/A | mjobs.q| 2G|-l s_vmem=2G,mem_req=2 | # You should look at the columns ext and fail. # If they (ext and fail) are zero, which means good.
For jobs didn't finish correctly, look at the logs.
Log directory localtion is exome/log/xxxxx, and the log file naming conventions are:
<jobname>.o<jobid> for log. (Example. split.sh.o1938906).
<jobname>.e<jobid> for errors. (Example. split.sh.e.1938906).
# Change directory to the logdir. (For copy number data analysis, go to exome/copy_number/log) cd exome/log/xxxxxx # Look for jobs failed. ls *1938906 # The full name of the files to check split.sh.o.1938906 split.sh.e.1938906
If the error log file doesn't tell you much about the failure, please look at out FAQ first. If none of the answers there helps you, please contact us via email.