使用gzfile在输出中给出了一个奇怪的行

时间:2018-07-25 12:56:40

标签: r rstudio read.table gz vcf

我正在使用gzfile读取压缩的vcf文件。

This is the vcf file:
##fileformat=VCFv4.0
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=HM2,Number=0,Type=Flag,Description="HapMap2 membership">
##INFO=<ID=HM3,Number=0,Type=Flag,Description="HapMap3 membership">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele, ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/technical/reference/ancestral_alignments/README">
##reference=human_b36_both.fasta
##INFO=<ID=AC,Number=1,Type=Integer,Description="total number of alternate alleles in called genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="total number of alleles in called genotypes">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth from MOSAIK BAM">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
chr1    5   .   T   GC,G    .   PASS    AA=T;AC=4;AN=114;DP=3251
chr1    7   .   G   A,CG    .   PASS    AA=G;AC=1;AN=106;DP=2676
chr1    9   rs61733845  CA  T   .   PASS    AA=c;AC=3;AN=122;DP=3477
chr1    12  .   T   A,C .   PASS    AA=T;AC=1;AN=178;DP=7275
chr1    15  rs1320571   G   A   .   PASS    AA=A;AC=6;AN=154;DP=7610;HM2;HM3
chr1    18  rs2760321   T   CGC .   PASS    AA=C;AC=128;AN=146;DP=3383;HM2;HM3
chr1    20  rs2760320   G   C   .   PASS    AA=G;AC=13;AN=178;DP=8362;HM2;HM3

我正在使用以下命令:

read.table(gzfile("/home/data/test_set_13.vcf.tar.gz") , skipNul = TRUE, header = FALSE)

但是输出的第一行包含vcf文件中不存在的信息:

                     V1   V2         V3          V4          V5     V6                     V7                                 V8
1 test_set_13.vcf000664 1750     001750 00000002247 13317410347 013321 0ustar00gk39gk39000000                             000000
2                  chr1    5          .           T        GC,G      .                   PASS           AA=T;AC=4;AN=114;DP=3251
3                  chr1    7          .           G        A,CG      .                   PASS           AA=G;AC=1;AN=106;DP=2676
4                  chr1    9 rs61733845          CA           T      .                   PASS           AA=c;AC=3;AN=122;DP=3477
5                  chr1   12          .           T         A,C      .                   PASS           AA=T;AC=1;AN=178;DP=7275
6                  chr1   15  rs1320571           G           A      .                   PASS   AA=A;AC=6;AN=154;DP=7610;HM2;HM3
7                  chr1   18  rs2760321           T         CGC      .                   PASS AA=C;AC=128;AN=146;DP=3383;HM2;HM3
8                  chr1   20  rs2760320           G           C      .                   PASS  AA=G;AC=13;AN=178;DP=8362;HM2;HM3

我不知道输出的第一行来自何处​​。 有什么想法吗?

0 个答案:

没有答案