如何计算Shell中的重复句子

时间:2017-12-19 10:03:36

标签: linux shell awk sed

CustomModel *selectedData = tableViewDataSource[indexPath.row]
if ([selectedData isSpecial])
{
    //Go to SubVC, ie. another MainVC
    SubViewController *nextVC = ... //segue.destinationViewController looping on itself or storyboard.instatianteViewController instantiating another MainViewController
    nextVC.data = selectedData.subdata;
}
else
{
    //Go VC2
    ViewController *nextVC = ... //segue.destinationViewController or storyboard.instatianteViewController...
    nextVC.data = selectedData.subdata;
}

我的预期结果如下:

cat file1.txt
abc bcd abc ...
abcd bcde cdef ...
abcd bcde cdef ...
abcd bcde cdef ...
efg fgh ...
efg fgh ...
hig ...

我找到了解决问题的方法,但我的代码有点吵。

abc bcd abc ...      

abcd bcde cdef ...  
<!!! pay attention, above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, above sentence has repeated 3 times !!!>

hig ...

我想知道你是否能给我更高效的方式来实现这个目标。谢谢。

3 个答案:

答案 0 :(得分:2)

sort + uniq + sed 解决方案:

sort file1.txt | uniq -c | sed -E 's/^ +1 (.+)/\1\n/; 
 s/^ +([2-9]|[0-9]{2,}) (.+)/\2\n<!!! pay attention, the above sentence has repeated \1 times !!!>\n/'

输出:

abc bcd abc ...

abcd bcde cdef ...
<!!! pay attention, the above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, the above sentence has repeated 2 times !!!>

hig ...

awk

sort file1.txt | uniq -c | awk '{ n=$1; sub(/^ +[0-9]+ +/,""); 
printf "%s\n%s",$0,(n==1? ORS:"<!!! pay attention, the above sentence has repeated "n" times !!!>\n\n") }'

答案 1 :(得分:2)

$ awk '
    $0==prev { cnt++; next }
    { prt(); prev=$0; cnt=1 }
    END { prt() }
    function prt() {
        if (NR>1) print prev (cnt>1 ? ORS "repeated " cnt " times" : "") ORS
    }
' file
abc bcd abc ...

abcd bcde cdef ...
repeated 3 times

efg fgh ...
repeated 2 times

hig ...

答案 2 :(得分:1)

如果您的线路尚未分组,则可以使用

awk '
    NR == FNR {count[$0]++; next} 
    !seen[$0]++ {
        print
        if (count[$0] > 1)
            print "... repeated", count[$0], "times"
    }
' file1.txt file1.txt

如果您的文件非常大,这将消耗大量内存。您可能希望先对其进行排序。