一个非常简单的问题,如何过滤掉包含多列的制表符分隔文件?
该文件如下所示:
chr10 100008748 100010821 . . - 1 1 1 1 1 3 0 0 3 5 13 2 3 11 1 4
chr10 100010933 100011322 . . - 1 1 1 1 1 0 2 0 5 3 11 0 0 6 1 4
chr10 100010954 100011322 . . - 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0
chr10 100011459 100012109 . . - 1 1 1 1 1 1 0 4 8 2 17 4 3 11 2 2
chr10 100011959 100015344 . . + 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
我需要应用的过滤器是10.我需要在所有列中看到10个或更多才能准确。我尝试了以下脚本,但它不起作用:
cat infile.txt \
> while read line \
> do \
> ext=`echo $line | cut -f11-` \
> if [ "$ext" >= "10" \] \
> then \
> echo $line \
> fi \
> done > outfile.txt
我应该做什么呢?
错误讯息:
cat: while: No such file or directory
cat: read: No such file or directory
cat: line: No such file or directory
cat: do: No such file or directory
cat: ext=: No such file or directory
cat: if: No such file or directory
cat: [: No such file or directory
cat: : No such file or directory
cat: 10: No such file or directory
cat: ]: No such file or directory
cat: then: No such file or directory
cat: echo: No such file or directory
cat: fi: No such file or directory
cat: done: No such file or directory
答案 0 :(得分:1)
您可以使用awk:
awk -F'\t' '{for (i=11; i<=22; i++) if ($i>10) {print; break}}' file
chr10 100008748 100010821 . . - 1 1 1 1 1 3 0 0 3 5 13 2 3 11 1 4
chr10 100010933 100011322 . . - 1 1 1 1 1 0 2 0 5 3 11 0 0 6 1 4
chr10 100011459 100012109 . . - 1 1 1 1 1 1 0 4 8 2 17 4 3 11 2 2
答案 1 :(得分:1)
你可以试试这个:
awk 'BEGIN {FS="\t"} {for (i=11;i<=NF;i++) {printf $i"\t" }}{printf "\n"}' file.txt
下面
1. FS="\t" will set FS as tab.
2. for loop will start from i=11
3. printf $i"\t" will print each value corresponding to value of i + a tab in same line(since we used printf).
4. and in last printf "\n" will take you to the next line for each input line.
示例输出将是:
1 3 0 0 3 5 13 2 3 11 1 4
1 0 2 0 5 3 11 0 0 6 1 4
0 0 0 0 0 1 0 0 0 0 0 0
1 1 0 4 8 2 17 4 3 11 2 2
0 0 0 0 0 0 0 0 0 0 0 0