Question

我有一个包含多行文本的txt文件，例如：

This is a
file containing several
lines of text.

现在我有另一个仅包含单词的文件，如下所示：

this
contains
containing
text

现在我想输出文件1中的单词，而不是文件2中的单词。我尝试了以下操作：

xargs -n1将每个空格分隔的子字符串放在换行符上。

tr -d '[:punct:]删除标点符号

sort和uniq创建一个排序文件，以与comm一起使用，该文件与-i标志一起使用，以使其不区分大小写。

但是这不起作用。我在网上环顾四周，发现了类似的问题，但是，我无法弄清楚自己在做什么错。这些问题的大多数答案是使用2个已排序的文件，其中已删除换行符，空格和标点符号，而我的file_1可能在一开始就包含其中的任何一个。

所需的输出：

is
a
file
several
lines
of

Answer 1

我会尝试更直接的方法：

['r', 'g']

用于grep的标志：q表示安静（不需要输出），w表示单词匹配

Answer 2

awk中的一个：

$ awk -F"[^A-Za-z]+" '          # anything but a letter is a field delimiter
NR==FNR {                       # process the word list
    a[tolower($0)]
    next
}
{
    for(i=1;i<=NF;i++)          # loop all fields
        if(!(tolower($i) in a)) # if word was not in the word list
            print $i            # print it. duplicates are printed also.
}' another_file txt_file

输出：

is
a
file
several
lines
of

grep：

$ grep -vwi -f another_file <(cat txt_file | tr -s -c '[a-zA-Z]' '\n')
is
a
file
several
lines
of

Answer 3

paste + grep 方法：

grep -Eiv "($(paste -sd'|' <file2.txt))" <(grep -wo '\w*' file1.txt)

输出：

is
a
file
several
lines
of

Answer 4

此管道将获取原始文件，用换行符替换空格，转换为小写，然后使用grep过滤（{$newArr = array_map(function($v1, $v2, $v3){ return [$v, $v2, $v3]; }, $arr1, $arr2, $arr3);）全字（-v）不区分大小写（-w）使用给定文件（-i）中的行：

-f file2

bash检查第一个文件中没有包含的单词

4 个答案: