按同一行中引用的次数对文本中的单词进行排序

时间:2019-04-12 00:37:49

标签: linux sorting tr

查找文本中最常用的单词,并按每个数字在同一行中打印的数字排序

    grep -oE '[[:alpha:]]' file.txt | sort | uniq -c | sort -nr

它给出了

3 linux
3 fedora
2 ubuntu
2 mandriva

我寻找

3 linux fedora
2 ubuntu mandriva


    grep -oE '[[:alpha:]]' file.txt | sort | uniq -c | sort -nr

结果

 3 linux
 3 fedora
 2 ubuntu
 2 mandriva

我在寻找

 3 linux fedora
 2 ubuntu mandriva

1 个答案:

答案 0 :(得分:0)

我无法在bash oneliner中执行此操作,但是如果适合您,我会在简短的python脚本中将其保存在这里。

import os

preMergedList = os.popen("grep -o -E '\w+' file.txt | sort | uniq -c | sort -nr").readlines()

countDict = {}
for line in preMergedList:
    count, word = line.split(None)
    count = int( count.strip() )
    word = word.strip()
    if not countDict.has_key( count ):
        countDict[count] = ""
    countDict[count] += word + " "

for count, wordString in sorted( countDict.iteritems(), reverse=True ):
    print count, wordString