我正在使用gzfile读取压缩的vcf文件。
This is the vcf file:
##fileformat=VCFv4.0
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=HM2,Number=0,Type=Flag,Description="HapMap2 membership">
##INFO=<ID=HM3,Number=0,Type=Flag,Description="HapMap3 membership">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele, ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/technical/reference/ancestral_alignments/README">
##reference=human_b36_both.fasta
##INFO=<ID=AC,Number=1,Type=Integer,Description="total number of alternate alleles in called genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="total number of alleles in called genotypes">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth from MOSAIK BAM">
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 5 . T GC,G . PASS AA=T;AC=4;AN=114;DP=3251
chr1 7 . G A,CG . PASS AA=G;AC=1;AN=106;DP=2676
chr1 9 rs61733845 CA T . PASS AA=c;AC=3;AN=122;DP=3477
chr1 12 . T A,C . PASS AA=T;AC=1;AN=178;DP=7275
chr1 15 rs1320571 G A . PASS AA=A;AC=6;AN=154;DP=7610;HM2;HM3
chr1 18 rs2760321 T CGC . PASS AA=C;AC=128;AN=146;DP=3383;HM2;HM3
chr1 20 rs2760320 G C . PASS AA=G;AC=13;AN=178;DP=8362;HM2;HM3
我正在使用以下命令:
read.table(gzfile("/home/data/test_set_13.vcf.tar.gz") , skipNul = TRUE, header = FALSE)
但是输出的第一行包含vcf文件中不存在的信息:
V1 V2 V3 V4 V5 V6 V7 V8
1 test_set_13.vcf000664 1750 001750 00000002247 13317410347 013321 0ustar00gk39gk39000000 000000
2 chr1 5 . T GC,G . PASS AA=T;AC=4;AN=114;DP=3251
3 chr1 7 . G A,CG . PASS AA=G;AC=1;AN=106;DP=2676
4 chr1 9 rs61733845 CA T . PASS AA=c;AC=3;AN=122;DP=3477
5 chr1 12 . T A,C . PASS AA=T;AC=1;AN=178;DP=7275
6 chr1 15 rs1320571 G A . PASS AA=A;AC=6;AN=154;DP=7610;HM2;HM3
7 chr1 18 rs2760321 T CGC . PASS AA=C;AC=128;AN=146;DP=3383;HM2;HM3
8 chr1 20 rs2760320 G C . PASS AA=G;AC=13;AN=178;DP=8362;HM2;HM3
我不知道输出的第一行来自何处。 有什么想法吗?