过滤制表符分隔文件的多个列?

时间:2014-09-12 06:33:14

标签: bash shell

一个非常简单的问题,如何过滤掉包含多列的制表符分隔文件?

该文件如下所示:

chr10   100008748   100010821   .   .   -   1   1   1   1   1   3   0   0   3   5   13  2   3   11  1   4
chr10   100010933   100011322   .   .   -   1   1   1   1   1   0   2   0   5   3   11  0   0   6   1   4
chr10   100010954   100011322   .   .   -   0   1   0   1   0   0   0   0   0   1   0   0   0   0   0   0
chr10   100011459   100012109   .   .   -   1   1   1   1   1   1   0   4   8   2   17  4   3   11  2   2
chr10   100011959   100015344   .   .   +   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0

我需要应用的过滤器是10.我需要在所有列中看到10个或更多才能准确。我尝试了以下脚本,但它不起作用:

cat infile.txt \
> while read line \
> do \
> ext=`echo $line | cut -f11-` \
> if [ "$ext" >= "10" \] \
> then \
> echo $line \
> fi \
> done > outfile.txt

我应该做什么呢?

错误讯息:

cat: while: No such file or directory
cat: read: No such file or directory
cat: line: No such file or directory
cat: do: No such file or directory
cat: ext=: No such file or directory
cat: if: No such file or directory
cat: [: No such file or directory
cat: : No such file or directory
cat: 10: No such file or directory
cat: ]: No such file or directory
cat: then: No such file or directory
cat: echo: No such file or directory
cat: fi: No such file or directory
cat: done: No such file or directory

2 个答案:

答案 0 :(得分:1)

您可以使用awk:

awk -F'\t' '{for (i=11; i<=22; i++) if ($i>10) {print; break}}' file
chr10   100008748   100010821   .   .   -   1   1   1   1   1   3   0   0   3   5   13  2   3   11  1   4
chr10   100010933   100011322   .   .   -   1   1   1   1   1   0   2   0   5   3   11  0   0   6   1   4
chr10   100011459   100012109   .   .   -   1   1   1   1   1   1   0   4   8   2   17  4   3   11  2   2

答案 1 :(得分:1)

你可以试试这个:

 awk 'BEGIN {FS="\t"} {for (i=11;i<=NF;i++) {printf $i"\t" }}{printf "\n"}' file.txt

下面

 1. FS="\t" will set FS as tab.
 2. for loop will start from i=11
 3.  printf $i"\t" will print each value corresponding to value of i + a tab in same line(since we used printf).
 4.  and in last printf "\n" will take you to the next line for each input line.

示例输出将是:

 1       3       0       0       3       5       13      2       3       11      1       4
 1       0       2       0       5       3       11      0       0       6       1       4
 0       0       0       0       0       1       0       0       0       0       0       0
 1       1       0       4       8       2       17      4       3       11      2       2
 0       0       0       0       0       0       0       0       0       0       0       0