Question

我正在尝试搜索一组目录并列出某个字符串出现超过X次的文件。

例如，我想搜索/ home / userX / files（以及所有子目录）并列出字符串＆＃34;上传的所有文件＆＃34;发生超过10次。

理想情况下，这样的输出会很棒：

/home/userX/files/file1:15
/home/userX/files/file2:34
/home/userX/files/file3:67

其中：xx是该文件中的字符串数...但是这个最终计数不是必需的......只是一个很好的。

我已经找到了如何查找具有特定字符串的文件，计算单个文件中的字符串，以及列出文件中出现字符串但无法将这些字符串放在一起的文件......现在我只是完全慌乱了困惑......

感谢任何帮助！

提前谢谢。

Answer 1

我现在有一些我很满意的事情：

grep -Hcri uploads * | awk -F ':' -e '$2>10  {print}'

这仍然忽略了多次上传＆＃39;每行，但它应该相当快。

-H表示保留文件名
-c表示它应该只计算行数
-r用于递归
-i不区分大小写

它将输出传递给awk，它沿着冒号（-F ':'）分割每一行，如果第二个值大于10，则打印整行。

Answer 2

试试这个（根据评论编辑）：

find . -type f | xargs grep -o [STRING] | awk -F':' '{print $1}' | uniq -c | awk '$1>=X {print;}'

将[STRING]替换为您要搜索的字符串，将X替换为您希望其显示的次数。在您的示例中：

find . -type f | xargs grep -o uploads | awk -F':' '{print $1}' | uniq -c | awk '$1>=10 {print;}'

Answer 3

bash解决方案涉及一个小脚本，允许用户指定search term，minimum occurrences per file和search path。然后，它会收集/absolute/path/to/file:matches文件，其中search term在文件中出现大于或等于occurs次，保存数组中的所有匹配文件，以备日后使用时使用。出于此示例的目的，它只是打印搜索条件和数组中包含的匹配文件：

#!/bin/bash

[ $# -eq 3 ] || {                   ## test for sufficient input
    printf "error: insufficient input:  usage:  %s term occurs path\n" "${0//*\//}"
    exit 1
}

[ $2 -eq $2 >/dev/null 2>&1 ] || {  ## test that 'occur' is an integer value
    printf "error: invalid input:  occurs '%s' is not an integer value!\n" "$2"
    printf "\n  usage: %s term occurs path\n\n" "${0//*\//}"
    exit 1
}

[ -d "$3" ] || {                   ## test path is a valid directory
    printf "error: invalid input:  path '%s' is not a valid directory!\n" "$3"
    printf "\n  usage: %s term occurs path\n\n" "${0//*\//}"
    exit 1
}

srchterm="$1"       ## assignment of arguments to variables
occur=$2
srchpath="$3"

declare -a array    ## declare array to hold values

## for each file containing $srchterm
while IFS=$'\n' read -r line; do 
    [ "${line##*:}" -ge $occur ] &&                       ## test it occurs >= occur
        array+=( "$(realpath "${line%:*}"):${line##*:}" ) ## if so add it to array
done < <(grep -r -c "$srchterm" "$srchpath"/* )           ## grep -r -c to provides files

## output search information
printf "\nsearch term : %s\noccurrances : %d\nsearch path : %s\n\n" \
"$srchterm" $occur "$srchpath"
printf "number of matching files : %d\n\n" ${#array[@]}

for i in "${array[@]}"; do  ## output matching files
    printf "%s\n" "$i"
done

exit 0

使用/输出

$ bash srchterminfile.sh char 10 . search term : char occurrances : 10 search path : . number of matching files : 77 /home/david/dev/src-c/tmp/arginfo.c:16 /home/david/dev/src-c/tmp/bin_chs_test.c:19 /home/david/dev/src-c/tmp/binprntst.c:42 /home/david/dev/src-c/tmp/binprnverif.c:12 /home/david/dev/src-c/tmp/bookmgr.c:17 /home/david/dev/src-c/tmp/censorwds.c:11 /home/david/dev/src-c/tmp/ch+13.c:14 /home/david/dev/src-c/tmp/ch13str.c:20 /home/david/dev/src-c/tmp/chkendian.c:16 /home/david/dev/src-c/tmp/concatwords.c:17 <snip>

注意：如果您不需要将匹配的文件保存在数组中供以后使用，则只需删除该数组并将其替换为printf或{{1} }语句简单地输出行。我知道您想在脚本中组合并保存匹配的文件名和出现数据。

如何列出目录中具有超过X次的特定字符串的文件

3 个答案: