Question

我想提取一行，如果它在文本文件的指定列中包含一个单词。 如何使用单线程unix命令执行此操作？可能使用cat，echo，cut，grep p p等等。

我有一个以这种格式显示的文本文件

#SentenceID<tab>Sentence1<tab>Sentence2<tab>Other_unknown_number_of_columns<tab> ...

文本文件的示例如下所示：

021348  this is the english sentence with coach .   c'est la phrase française avec l'entraîneur .   And then there are several nonsense columns like these  .
923458  this is a another english sentence without the word .   c'est une phrase d'une autre anglais sans le bus mot .  whatever foo bar    nonsense columns    2134234 $%^&

如果我要查找的单词在第二列中是coach，则应输出该命令：

021348  this is the english sentence with coach .   c'est la phrase française avec l'entraîneur .   And then there are several nonsense columns like these  .

我可以用python这样做，但我正在寻找一个unix命令或者一些内容：

outfile = open('out.txt')
for line in open('in.txt'):
  if "coach" in line.split():
    print>>outfile, line

Answer 1

这个怎么样？

awk -F'\t' '{if($2 ~ "coach") print} your_file

-F'\t' - ＆gt;使分隔符成为选项卡。
$2 ~ "coach" - ＆gt;在第二个领域寻找“教练”。
print $0或print - ＆gt;打印整行。

修改

sudo_O提出以下建议，甚至更短：

awk -F'\t' '$2~/coach/' file

Answer 2

对于这种需求，我总是使用awk：

awk -F'\ t''$ 2~ / coach / {print $ 0;}'＆lt;文本文件

您可以使用$ x访问所有列，$ 0包含整行。测试是使用regexp进行的，在这种情况下非常简单，所以如果你的需求变得更复杂，它就会非常强大。

如果一行包含指定列中的单词，则提取该行

2 个答案: