查找文本中最常用的单词,并按每个数字在同一行中打印的数字排序
grep -oE '[[:alpha:]]' file.txt | sort | uniq -c | sort -nr
它给出了
3 linux
3 fedora
2 ubuntu
2 mandriva
我寻找
3 linux fedora
2 ubuntu mandriva
grep -oE '[[:alpha:]]' file.txt | sort | uniq -c | sort -nr
结果
3 linux
3 fedora
2 ubuntu
2 mandriva
我在寻找
3 linux fedora
2 ubuntu mandriva
答案 0 :(得分:0)
我无法在bash oneliner中执行此操作,但是如果适合您,我会在简短的python脚本中将其保存在这里。
import os
preMergedList = os.popen("grep -o -E '\w+' file.txt | sort | uniq -c | sort -nr").readlines()
countDict = {}
for line in preMergedList:
count, word = line.split(None)
count = int( count.strip() )
word = word.strip()
if not countDict.has_key( count ):
countDict[count] = ""
countDict[count] += word + " "
for count, wordString in sorted( countDict.iteritems(), reverse=True ):
print count, wordString