Question

我有一个包含文件的文件夹，我正在尝试做的是编写一个shell脚本，它打印文件的名称以及在这些文件中重复某个单词的次数。

我的输出应该是这样的：

filename 3
filename 12
filename 24
…

filename仅包含没有路径和扩展名的文件名。

我设法使用for循环，但我认为执行时间效率不高所以我的另一个想法是使用grep命令：

grep -c “word" */*.txt

我得到的输出如下：

folder/filename.txt:3

我尝试使用剪切命令，但我无法弄清楚如何避免减少单词出现在不同文件中的次数，并且文件名和数字之间必须有空格。

grep -c “word" */*.txt | cut -d'/' -f2 | cut -d'.' -f1

任何想法如何用grep或其他替代方法做到这一点？

Answer 1

您使用cut做了很多努力。当您使用cut解决问题时，大多数情况下您都找到了一个快速解决方案在这种情况下，您需要修复cut命令会产生难看的结果。

# Ugly cutting
grep -c "word" */*.txt | cut -d'/' -f2 | tr ':' '.' | cut -d"." -f1,3 | tr '.' ' '

此处修复cut错误，但您可以学习很酷的东西

# going weird
# Combine first colums
grep -c "word" */*.txt | cut -d'/' -f2 | cut -d"." -f1
# with second column
grep -c "word" */*.txt | cut -d'/' -f2 | cut -d":" -f2
# using paste and process substitution
paste -d" " <(grep -c "word" */*.txt | cut -d'/' -f2 | cut -d"." -f1) <(grep -c "word" */*.txt | cut -d'/' -f2 | cut -d":" -f2)

不，这不是解决这个问题的方法。将sed与

一起使用

grep -c "word" */*.txt | sed 's#.*/##;s#\..*:# #'
# or shorter
grep -c "word" */*.txt | sed 's#.*/\([^.]*\).*:#\1 #'

Grep命令 - 在两列中打印的每个文件中的某个单词的文件名和重复次数

1 个答案: