Question

我有一个文本文件（我们称其为keywords.txt），其中包含许多用换行符分隔的字符串（尽管不是用石头代替；我可以用空格，逗号或任何最合适的）。我还有许多其他文本文件（我将其统称为input.txt）。

我想做的是遍历input.txt中的每一行，并测试该行是否包含关键字之一。之后，根据我当时正在处理的输入文件，我需要将input.txt中的匹配行复制到output.txt中，并忽略不匹配的行，或者复制不匹配的行，然后忽略匹配。

我在寻找解决方案，但是，尽管我找到了部分方法来做自己想做的事情，但我还没有找到一种方法来完成我所要求的一切。虽然我可以尝试结合发现的各种解决方案，但是我主要担心的是，我最终想知道我编写的代码是否是实现此目的的最佳方法。

这是我目前在keywords.txt中拥有的摘录：

google
adword
chromebook.com
cobrasearch.com
feedburner.com
doubleclick
foofle.com
froogle.com
gmail
keyhole.com
madewithcode.com

以下是一个示例，可以在我的input.txt个文件中找到一个文件：

&expandable_ad_
&forceadv=
&gerf=*&guro=
&gIncludeExternalAds=
&googleadword=
&img2_adv=
&jumpstartadformat=
&largead=
&maxads=
&pltype=adhost^

在此代码段中，&googleadword=是唯一与过滤器匹配的行，在我的情况下，有些情况下output.txt要么只插入匹配的行，要么每行都不匹配关键字。

Answer 1

1。假设keywords.txt的内容由换行符分隔：

google
adword
chromebook.com
...

以下将起作用：

# Use keywords.txt as your pattern & copy matching lines in input.txt to output.txt
grep -Ff keywords.txt input.txt > output.txt

# Use keywords.txt as your pattern & copy non-matching lines in input.txt to output.txt
grep -vFf keywords.txt input.txt > output.txt

2。假设keywords.txt的内容由竖线分隔：

google|adword|chromebook.com|...

以下将起作用：

# Use keywords.txt as your pattern & copy matching lines in input.txt to output.txt
grep -Ef keywords.txt input.txt > output.txt

# Use keywords.txt as your pattern & copy non-matching lines in input.txt to output.txt
grep -vEf keywords.txt input.txt > output.txt

3。假设keywords.txt的内容用逗号分隔：

google,adword,chromebook.com,...

有很多方法可以实现相同的目的，但是一种简单的方法是使用tr用竖线替换所有逗号，然后使用grep的扩展正则表达式解释模式。

# Use keywords.txt as your pattern & copy matching lines in input.txt to output.txt
grep -E $(tr ',' '|' < keywords.txt) input.txt > output.txt

# Use keywords.txt as your pattern & copy non-matching lines in input.txt to output.txt
grep -vE $(tr ',' '|' < keywords.txt) input.txt > output.txt

Grep选项

 -v, --invert-match
       Selected lines are those not matching any of the specified patterns.   

 -F, --fixed-strings
       Interpret each data-matching pattern as a list of fixed strings, 
       separated by newlines, instead of as a regular expression.

 -E, --extended-regexp
       Interpret pattern as an extended regular expression
       (i.e. force grep to behave as egrep).

 -f file, --file=file
       Read one or more newline separated patterns from file.
       Empty pattern lines match every input line.
       Newlines are not considered part of a pattern.
       If file is empty, nothing is matched.

详细了解grep

详细了解tr

测试文件中的每一行是否包含另一个文件中的多个字符串之一

1 个答案: