Question

我正在从文件appliance_list.txt中读取一个字符串。

appliances_list.txt包含

fridge
dryer
ironbox
microwave

我正在阅读的文件是myappliances.txt。内容是

I have a <Appliance>fridge</Appliance>
I have another <Appliance>fridge</Appliance>
I have a <Appliance>refridgerator</Appliance>
I have a <Appliance>microwave</Appliance>
I have <Appliance>ironbox</Appliance> at home
I have another <Appliance>microwave</Appliance>
I have a <Appliance>hairdryer</Appliance>

我正在使用

grep -o -m1 -f appliances_list.txt myappliances.txt

输出

fridge

我想要的输出是，第一次出现每个字符串（完全匹配）

fridge
microwave
ironbox

有人能指出我正确的方向吗？请注意，myappliances.txt是一个示例文件。我的真实文件大小为2GB。因此需要优化的解决方案。类似的，当找到String1的第一个匹配时，停止搜索String1并移动到String2。

这不是Read string from one file, grep the first occurrence in another file的副本。 myappliances.txt模式在两种情况下都不同。

Answer 1

$ cat tst.awk
BEGIN { FS="</?Appliance>" }
NR==FNR { strings[$0]; ++numStrings; next }
$2 in strings {
    print $2
    delete strings[$2]
    if (--numStrings == 0) {
        exit
    }
}

$ awk -f tst.awk appliances_list.txt myappliances.txt
fridge
microwave
ironbox

Answer 2

这可能适合你（GNU sed）：

sed -r 's#.*#/\\<&\\>/{s/.*/&/;G;/^([^\\n]*)\\n.*\\1/!P;h}#' list |
sed -rf - -e 'd' file

从列表文件创建一个sed脚本并对文本文件运行它。

sed脚本将匹配项存储在保留空间中，并仅在匹配项唯一时打印匹配项。

从文件中读取字符串，在另一个文件中查找该字符串的第一个匹配项

2 个答案: