Question

一段时间以来，我一直在尝试编写一个bash脚本，该脚本应该读取* .txt文件并输出单词出现次数。到目前为止我没有运气。我知道算法，唯一的问题是语法。

这个脚本应该如何工作？

当我在终端中输入./myScript.sh myTextFile.txt时，它应按照从最大到最小的排序顺序输出所有单词出现次数，如下所示：

17 is 7.1%  
12 all 6.4%  
10 house 5.5%  
5 tree 3.7%

....................和soo on。

如果我放置一个开关./myScript.sh -x 3 myTextFile.txt，它应该只输出超过3个字符的单词。

如果我放置一个开关./myScript.sh -y 4 myTextFile.txt，它应该只输出出现4次或更多次的单词。在这里，我遇到了很多关于如何确定使用哪些开关以及它们保持什么价值的问题。

当然，如果我放一个不存在的文件或错误的开关，脚本应该抛出错误。

感谢您的帮助。

Answer 1

You can use awk to get the word count:

 awk '{for(i=1;i<=NF;i++){a[$i]++;tot++}}END{for(j in a) {printf("%s %s %2.1f%\n",a[j],j,a[j]/tot*100)}}' myTextFile.txt | sort -g

This awk command fills the array a[] with all words, and their index count.

tot is the total number of words encountered.

The END statement loops through the array and show the count, word, and percentage.

sort -g is performing a numerical sort based on the count number.