Question

我需要sed的正则表达式（仅限sed），这有助于我弄清楚某个单词是否出现在单词中3次，所以打印此行......

让我们说这是文件：

abc abc gh abc
abcabc abc
 ab ab cd ab xx ab
ababab cc ababab
abab abab cd abab

所以输出是：

P1 F1

abc abc gh abc
 ab ab cd ab xx ab
abab abab cd abab

这就是我的尝试

sed -n '/\([^ ]\+\)[ ]+\1\1\1/p' $1

它不起作用......：/我做错了什么？

如果这个词在开头是不是很重要，它们就不需要作为序列出现

Answer 1

您需要在.*

之间添加\1

$ sed -n '/\b\([^ ]\+\)\b.*\b\1\b.*\b\1\b/p' file
abc abc gh abc
 ab ab cd ab xx ab
abab abab cd abab

我假设你的输入只包含空格和单词字符。

Answer 2

我知道它要求sed，但我在sed看到的所有系统都有awk，所以这里有一个awk解决方案：

awk -F"[^[:alnum:]]" '{delete a;for (i=1;i<=NF;i++) a[$i]++;for (i in a) if (a[i]>2) {print $0;next}}' file
abc abc gh abc
 ab ab cd ab xx ab
abab abab cd abab

与正则表达式解决方案相比，这可能更容易理解。

awk -F"[^[:alnum:]]" # Set field separator to anything other than alpha and numerics characters.
'{
delete a            # Delete array "a"
for (i=1;i<=NF;i++) # Loop trough one by one word
    a[$i]++         # Store number of hits of word in array "a"
for (i in a)        # Loop trough the array "a"
    if (a[i]>2) {   # If one word is found more than two times:
        print $0    # Print the line
        next        # Skip to next line, so its not printed double if other word is found three times
    }
}' file             # Read the file

正则表示重复单词

2 个答案: