使用纯`bash`打印每个单词及其出现次数

时间:2017-12-19 05:24:57

标签: bash shell

我在下面给出了代码。我想打印每个单词及其出现次数,而不使用keylist = filter(None, (re.findall(clientname,k) for k in self.cacheDictionary)) # Python3: if you want to persist a list # keylist = list(filter(None, (re.findall(clientname,k) for k in self.cacheDictionary))) for items in keylist: print(items) ks = [k for k in self.cacheDictionary if clientName in k] for k in ks: self.cacheDictionary.pop(k) # del self.cacheDictionary[k] dict等外部工具。

我可以计算单词的总数,但是这里我也有一个问题:在输出中我没有得到总字数,输出小于它应该是。

我该怎么办?

self.cd = {k: v for k, v in self.cd.items() if clientName not in  k}

2 个答案:

答案 0 :(得分:0)

您可以使用关联数组来计算单词,有点像这样:

$ cat foo.sh
#!/bin/bash                                                                     

declare -A words

while read line
do
    for word in $line
    do
        ((words[$word]++))
    done
done

for i in "${!words[@]}"
do
    echo "$i:" "${words[$i]}"
done

测试它:

$ echo this is a test is this | bash foo.sh
is: 2
this: 2
a: 1
test: 1

这个答案几乎是根据这些优秀的答案构建的:thisthis。不要忘记对它们进行投票。

答案 1 :(得分:0)

James Brown's answer的两个改进版本(考虑一个单词的标点符号,并打破双引号和单引号组):

  1. 标点符号被视为单词的一部分:

    #!/bin/bash
    declare -A words
    
    while read line ; do
        for word in ${line} ; do
            ((words[${word@Q}]++))
    done ; done
    
    for i in ${!words[@]} ; do
        echo ${i}: ${words[$i]}
    done
    
  2. 标点不是单词的一部分,(如wc):

    #!/bin/bash
    declare -A words
    
    while read line ; do
        line="${line//[[:punct:]]}"
        for word in ${line} ;do 
            ((words[${word}]++))
    done ; done
    
    for i in ${!words[@]} ;do
        echo ${i}: ${words[$i]}
    done
    
  3. 经过测试的代码,带有棘手的引用文字:

    • fortune -m "swear" | bash foo.sh

    • man bash | ./foo.sh | sort -gr -k2 | head