Question

我正在使用此命令：awk 'NR==FNR{a[$0];next}!($0 in a)' spellingword.txt /tmp/userwords.txt来比较两个文件，希望我发现差异然后将该差异转换为数值。

例如，当比较两个文件时， userwords.txt 会返回三个与 spellingword.txt 不匹配的字词 - 因此，三行提出这些话。现在，我想获取该输出并将其转换为数字“3”。

更新：

Spellingword.txt 
tall
ball
fall
wall
paul

Userword.txt
tall
ball
fall
wall
pall

最终用户拼错了保罗。现在，在比较文件时，我得到了这个结果。

pall}

然后使用grep -Rl "curl" ./ | wc -l命令，我得到2的结果，它应该是1。 }来自哪里？有任何想法吗？或者我接近这一切都错了？

Answer 1

我的userwords.txt（已注明）：

tall  # match
ball  # match  
fall  # match
wall  # match
pall  # no match
paul} # partial match

代码：

$ awk '                      
NR==FNR {                    # hash the first file
    a[$1]
    next
}
{
    if($1 in a)               # search for full match
        next                  # skip to next record if there was a match, else:
    for(i in a)               # loop thru all entries in hash
        if($1 ~ i || i ~ $1)  # search for partial match
            next              # skip to next record if there was a match, else
    c++                       # count misses
} 
END { 
    print c                   # print miss count
}' spellingword.txt /tmp/userwords.txt
1                             # this was the output for "pall"

唯一的改进是搜索“部分匹配”，即。比较paul和paul}是一个匹配，然后再次明确拼写错误pual与paul不匹配。如果您想要抓住这些，我建议您尝试使用近似模式匹配工具agrep并使用它来检测具有足够参数的拼写错误。

然后比较文件以获取差异并使其成为数值

1 个答案: