How to Check The Result


On this page, you'll learn how you can check the results. You'll also learn how to read the result files, where to look for the alignment results (bam files), and the methods to check whether jobs were succeeded or not.

How to Read The Result File (Fisher's exact test)


The result file is the file which contains the outputs obtained from
the Command 4, Mutation Calling & Annotation (Fisher's exact test).

Bring sum_${sample name}_${date you did data analysis}_${sampletype}.exome.result.txt to your local PC and open it up with Excel.

The columns are exaplined below:

For the columns that are not explained here, please refer to ANNOVAR.

			# exome.result.txt
			Chr   Start   End    Ref  Obs  bases_tumor  bases_normal  misRate_tumor  strandRatio_tumor  misRate_normal  strandRatio_normal  p-value
			chr1   1000   1000   G    T    0,0,15,14    0,0,12,1           0.482758                0.7        0.076923                 0.5  1.829449
			chr2   2000   2000   -    TA   80,9         73,2                 0.1125                  0        0.023797                 0.5  1.229181
			chr3   3000   3002   ACA  -    13,8         12,0               0.615384                0.5               0   ---                2.783149
			

Chr Start End
 mutation candidate positions

Ref
 Reference base against the mutation candidate. Hyphenated for insertions.

Obs
 Base sequences of the mutation candidate. Hyphenated for deletions.

bases_tumor, bases_normal
 Each base(A,C,G,T) counts for SNVs. Depth and indel reads for indels.

misRate_tumor, misRate_normal
 The mismatch rate computed.
 SNVs: #Obs(counts of reads obs bases) / Depth (total of A, C, G, and T counts).
  From the Table above, 1st line, misRate_tumor 14/29 = 0.482758
  From the Table above, 1st line, misRate_normal 1/13 = 0.076923
 Indels: indel reads/Depth.
  From the Table above, 2nd line, misRate_tumor 9/80 = 0.1125
  From the Table above, 2nd line, misRate_normal 2/73 = 0.027397

strandRatio_tumor, strandRatio_normal
 Percentage of mismatched reads that are mapped along the + strand. If all the mismatched reads are aligned to the + strand direction, the value is 1. If mapped all to the - strand, the value is 0. Equally mapped to the + and - strands, 0.5.(If this value is close to 0 or 1, we suspect possible sequencing errors).

p-value
 -log10(p-value), For R, you should write something like:
 SNV -log10(fisher.test(matrix(as.integer(c(15,12,14,1)),2,2))$p.value);
 Indel -log10(fisher.test(matrix(as.integer(c(71,9,71,2)),2,2))$p.value);

Standard filters
 1000g: filter out those 1000Genome registered mutations.
 dbSNP: filter out those dbSNP registered mutations.
 misRate_normal: filter out candidates that have high mutation rates in the normal sample.

How to Read The Result File (Empirical Baysian mutation Calling)


The result file is the file which contains the outputs obtained from
the Command 6, Mutation Calling & Annotation (Empirical Baysian mutation Calling).

Bring ${TAG}.exome.result.txt to your local PC and open it up with Excel.
The columns are exaplined below:

For the columns that are not explained here, please refer to ANNOVAR.

			# exome.result.txt (SAMPLE)
			Chr   Start   End    Ref  Obs  bases_tumor  bases_normal  misRate_tumor  strandRatio_tumor  depth_tumor  variantNum_tumor  misRate_normal  strandRatio_normal  depth_normal  variantNum_normal       p-value   p-value (+strand)  p-value (-strand)  p-value(Fisher)  alpha (+starnd)  beta (+strand)  alpha (-strand)  beta(-strand)
			chr1   1000   1000   G    T    14,10,15,4   7,1,7,1            0.482758                0.7          29                 14        0.142857                 0.5           14                   2   2.123456789         0.123456789        2.123456789         1.349564     0.100001462     3.982312349      2.545231456    201.1434566
			chr2   2000   2000   -    TA   20,0,69,9    30,1,43,1          0.101123                  0          89                  9        0.027397                 0.5           73                   2   3.123456789                   0        4.123456789        0.9459894             0.1     47.82511302              0.1    22.12343452
			chr3   3000   3002   ACA  -    11,4,10,4    5,0,7,0            0.380952                0.5          21                  8               0   ---                         12                   0   2.123456789         0.123456789        2.123456789         1.182557             0.1     1.123456785              0.1    8.125789943
			

Chr Start End
 the position of the candidate mutation

Ref
 the reference base for that position ('-' for insertions).

Obs
 the alternated sequence for the mutation candidate ('-' for deletions).

bases_tumor, bases_normal
 (sequencing depth for positive strand: the number of variant reads for positive strand: sequencing depth for negative strand: the number of variant read for negative strand) for both tumor and normal samples.

misRate_tumor, misRate_normal
 the mismatch rates computed in the tumor and normal samples

strandRatio_tumor, strandRatio_normal
 the ratio of variant reads aligned to positive strand for the tumor and normal samples.

depth_tumor depth_normal
 sequencing depths for that position for the tumor and normal samples.

variantNum_tumor, variantNum_normal
 the number of supporting variant read in the tumor and normal samples.

p-value
 the minus logarithm of p-value of the EBCall. This is a combined value from the two p-values caluculated in positive and negative strands  Please see the following paper the calculation of p-value.
 An empirical Bayesian framework for mutation detection from cancer genome sequencing data.
 Nucleic Acids Research (Published: February 10, 2013)

p-value (+strand), p-value (-strand)
 the minus logarithm of p-value of the EBCall for positive and negative strand, respectively

p-value(Fisher)
 the minus logarithm of the p-value by Fisher's exact test

alpha (+starnd),beta (+strand),alpha (-strand),beta(-strand)
 the estimated parameter values of Beta-Binominal sequencing error model for that variant.

Standard filters
 misRate_tumor: filter out candidates that have low mutation rates in the tumor sample.
 misRate_normal: filter out candidates that have high mutation rates in the normal sample.
 p-value, p-value(Fisher): filter out candidates that have lower p-value.

How to Read The Summary File


The summary file is obtained in Command 2, ALIGNMENT & GENERATE SUMMARY TABLE. There you should find the alignment rate and coverate infomation.

Bring your exome/data/output/${sample name}/${RUN date}/summary/ directory to your local PC. Let's go over the results you should at least look at:

List of result files:
			# Output Dir
			exome/data/output/${Sample name}/${RUN date}/summary/${type}
			# Result Files
			AlignmentSummaryMetrics.txt
			ga.metrics
			GcBiasDetailMetrics.pdf
			GcBiasDetailMetrics.txt
			GcBiasSummaryMetrics.txt
			InsertSizeMetrics.pdf
			InsertSizeMetrics.txt
			MeanQualityByCycle.pdf
			MeanQualityByCycle.txt
			QualityScoreDistribution.pdf
			QualityScoreDistribution.txt
AlignmentSummaryMetrics.txt
 PF_READS: Number of reads that survived the Illumina filter. PF_READS should be the same with TOTAL_READS.
 PF_READS_ALIGNED: Number of reads that are aligned.
 PCT_PF_READS_ALIGNED : Percentage of reads that are aligned.
 For more info, please refer to the Picard javadoc (Class AlignmentSummaryMetrics) .

CalculateHsMetrics.txt
 PF_UNIQUE_READS: Number of reads after removing duplicates.
 PCT_PF_UQ_READS_ALIGNED: Alignment percentage after removing duplicates.
 MEAN_TARGET_COVERAGE: target coverage
 For more infor, please refer to the Picard javadoc (Class HsMetrics) .

MeanQualityByCycle.pdf
 You can check if the qualities are within your expectations.

InsertSizeMetrics.pdf
 You can check if the libarary's insertion size is within your expectations.

How to Read The Log File


Let's note all the logs are stored under the same output directory, "exome/log."

In the Main Shell (map_bwa.sh, realigh_gatk.sh, fisher_test.sh) Log, you see qsub'ed jobs' return values.

Normal log file example . . .

		['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh']
		['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh']
		['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh']
		['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh']
		Cleaning up started.
		
		Mon Apr  9 22:32:54 2012
		
		Cleaning up in progress.
		 Waiting for submitted jobs to be finished.
		
		job id : 33394 failed =0 exit_status=0
		job id : 33393 failed =0 exit_status=0
		job id : 33392 failed =0 exit_status=0
		job id : 33391 failed =0 exit_status=0
		Cleaning up finished.
		Mon Apr  9 22:37:20 2012
		
		# comments
		# In this job, there were 4 qsub'ed rm.sh jobs.
		# All the jobs were finished with "failed=0 exit_status=0,"
		# we see the Main shell is also finished with success.

Error log file example . . .

		['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh']
		['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh']
		['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh']
		['qsub', '-e', 'log/sample_100422', '-o', 'log/sample_100422', 'script/rm.sh']
		Cleaning up started.
		
		Mon Apr  9 22:32:54 2012
		
		Cleaning up in progress.
		 Waiting for submitted jobs to be finished.
		
		job id : 33394 failed =0 exit_status=0
		job id : 33393 failed =1 exit_status=125
		Cleaning up finished.
		Mon Apr  9 22:37:20 2012
		
		# comment
		# In this job, there were 4 qsub'ed rm.sh jobs.
		# One of the jobs was finished with "failed=1 exit_status=125."
		# If one of the "child" jobs fails, The Main shell immediately stops.

Upon error, we need to look at the qsub log in the Main Shell.

		# output dir
		exome/log/${sample}_${RUN date}
		# std output log
		rm.sh.o33393
		# err log
		rm.sh.e33393
		
		# Sometimes your jobs fail due to the Supercomputer's system huccup.
		# Resubmission often works.

List of Main Shell log filenames

		# bwa alignment logs
		map_bwa.sh.o0000000
		map_bwa.sh.e0000000
		# realign job logs
		realign_gatk.sh.o0000000
		realign_gatk.sh.e0000000
		# your data analysis logs
		fisher_test.sh.o0000000
		fisher_test.sh.e0000000
		
		# {o,e}0000000 should be replaced with your job ID.
Note that copy number data analysis job doesn't leave logs corresponding above. Error check should be done by following the next instruction, How to Confirm The Result of The Submitted Job.

How to Confirm The Result of The Submitted Job


You can check whether your jobs were succeeded or not on the Supercomputer.

Please use the qacct2 command availble on the HGC Supercomputer system.

			# Usage:
			qacct2 -o $USER -b yyyyMMddhhmm -e yyyyMMddhhmm -l -f
			
			# comments
			# -o your username, echo $USER
			# -b The earliest end time for jobs to be summarized, in the format [yyyyMMddhhmm]
			# -e The latest end time for jobs to be, in the format [yyyyMMddhhmm]
			# -l listfy the result.
			# -f display failed in SGE or abnoraml exit status job only.
			
			# Example. User, genomon searches _errored_ jobs between 2012/07/01/ 00:00 and 2012/07/02/ 00:00
			qacct2 -o genomon -b 201207010000 -e 201207020000 -l -f
			|    owner|  jobid|     task|slot|  pe_id| granted_pe|ext|fail|   qname| host|jobname    |      end_time|  clock| mmem|rmem|  r_q| r_cpu|qdel|fail_txt  |      Rq|  Rm|Ropt                     |
			|  genomon|1938906|undefined|   1|   NONE|       NONE|  1|   0| mjobs.q|c369i|split.sh   |20120701-11:20|    266| 3.0G|  4G| NONE|  NONE|NONE|N/A       | mjobs.q|  4G|-l s_vmem=4G,mem_req=4   |
			|  genomon|1938901|undefined|   1|   NONE|       NONE|152| 100| mjobs.q|c628i|split.sh   |20120701-10:45|   4449| 4.0G|  4G| NONE|  NONE|NONE|assumedly | mjobs.q|  8G|-l s_vmem=8G,mem_req=8   |
			|  genomon|1938928|undefined|   1|   NONE|       NONE|  1|   0| mjobs.q|c722i|split.sh   |20120701-09:52|      0| 0.0G|  2G| NONE|  NONE|NONE|N/A       | mjobs.q|  2G|-l s_vmem=2G,mem_req=2   |
			
			# You should look at the columns ext and fail.
			# If they (ext and fail) are zero, which means good.

For jobs didn't finish correctly, look at the logs.

Log directory localtion is exome/log/xxxxx, and the log file naming conventions are:
<jobname>.o<jobid> for log. (Example. split.sh.o1938906).
<jobname>.e<jobid> for errors. (Example. split.sh.e.1938906).

			# Change directory to the logdir. (For copy number data analysis, go to exome/copy_number/log)
			cd exome/log/xxxxxx
			# Look for jobs failed.
			ls *1938906
			# The full name of the files to check
			split.sh.o.1938906
			split.sh.e.1938906

If the error log file doesn't tell you much about the failure, please look at out FAQ first. If none of the answers there helps you, please contact us via email.

^ Go to Top