在下面的awk
中,有一种方法可以只处理模式#CHROM
下面的行,但是在输出中打印全部。我遇到的问题是,如果我忽略了输出中打印的#
的所有行,但没有#
的其他行重复。在我的数据文件中有数千行,但下面只有awk
更新了下面的oone格式。谢谢你:)。
档案 tab-delimited
##bcftools_normVersion=1.3.1+htslib-1.3.1
##bcftools_normCommand=norm -m-both -o genome_split.vcf genome.vcf.gz
##bcftools_normCommand=norm -f /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta -o genome_annovar.vcf genome_split.vcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
chr1 948797 . C . 0 PASS DP=159;END=948845;MAX_DP=224;MIN_DP=95 GT:DP:MIN_DP:MAX_DP 0/0:159:95:224
AWK
awk '!/^#/
BEGIN {FS = OFS = "\t"
}
NF == 10 {
split($8, a, /[=;]/)
$11 = $12 = $13 = $14 = $15 = $18 = "."
$16 = (a[1] == "DP") ? a[2] : "DP=num_Missing"
$17 = "homref"
}
1' out > ref
curent输出 tab-delimited
##bcftools_normVersion=1.3.1+htslib-1.3.1
##bcftools_normCommand=norm -m-both -o genome_split.vcf genome.vcf.gz
##bcftools_normCommand=norm -f /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta -o genome_annovar.vcf genome_split.vcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
chr1 948797 . C . 0 PASS DP=159;END=948845;MAX_DP=224;MIN_DP=95 GT:DP:MIN_DP:MAX_DP 0/0:159:95:224 --- duplicated line ---
chr1 948797 . C . 0 PASS DP=159;END=948845;MAX_DP=224;MIN_DP=95 GT:DP:MIN_DP:MAX_DP 0/0:159:95:224 . . . . . 159 homref . --- this line is correct ---
所需的输出 tab-delimited
##bcftools_normVersion=1.3.1+htslib-1.3.1
##bcftools_normCommand=norm -m-both -o genome_split.vcf genome.vcf.gz
##bcftools_normCommand=norm -f /home/cmccabe/Desktop/NGS/picard-tools-1.140/resources/ucsc.hg19.fasta -o genome_annovar.vcf genome_split.vcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
chr1 948797 . C . 0 PASS DP=159;END=948845;MAX_DP=224;MIN_DP=95 GT:DP:MIN_DP:MAX_DP 0/0:159:95:224 . . . . . 159 homref .
答案 0 :(得分:1)
您的第一个声明:
/^#/
说“打印以#
开头的每一行”和您的上一行:
1
说“打印每一行”。因此输出中有重复的行。
要仅修改不以#
开头但打印所有行的行:
!/^#/ { do stuff }
1