Question

到目前为止，我的bash脚本接受两个参数...输入可以是文件或目录，输出是输出文件。它以递归方式查找所有文件，如果输入是文件，则查找找到的所有文件中每个单词的所有出现次数，并将它们列在输出文件中，左边是数字，右边的单词从最大到最小排序。现在它也将数字计为不应该做的单词...我怎么能只找到所有出现的有效单词而没有数字呢？此外，在最后一个if语句中...如果输入是一个目录，我很难让它做我为文件做的事情。它需要查找该目录中的所有文件，如果该目录中有另一个目录，则需要查找其中的所有文件，依此类推。然后，它需要计算所有文件中每个单词的所有出现次数，并将它们存储到输出文件中，就像文件的情况一样。我正在考虑将它们存储在一个数组中，但我不确定它是否是最佳方式，而且我的语法已关闭，因为它不起作用...所以我想知道我该怎么做？谢谢！

    #!/bin/bash

    INPUT="$1"
    OUTPUT="$2"
    ARRAY=();

    # Check that there are two arguments
    if [ "$#" -ne 2 ]
    then
       echo "Usage: $0 {dir-name}";
       exit 1
    fi

    # Check that INPUT is different from OUTPUT
    if [ "$INPUT" = "$OUTPUT" ]
    then
       echo "$INPUT must be different from $OUTPUT";
    fi

    # Check if INPUT is a file...if so, find number of occurrences of each word
    # and store in OUTPUT file sorted in greatest to least
    if [ -f "$INPUT" ]
    then
       for name in $INPUT; do
          if [ -f "$name" ]
          then
             xargs grep -hoP '\b\w+\b' < "$name" | sort | uniq -c | sort -n -r > "$OUTPUT"
          fi
       done
    # If INPUT is a directory, find number of occurrences of each word
    # and store in OUTPUT file sorted in greatest to least
    elif [ -d "$INPUT" ]
    then
       find $name -type f > "${ARRAY[@]}"
       for name in "${ARRAY[@]}"; do
          if [ -f "$name" ]
          then
             xargs grep -hoP '\b\w+\b' < "$name" | sort | uniq -c | sort -n -r > "$OUTPUT"
          fi
       done
    fi

Answer 1

我不建议你指定输出文件，因为你必须对它进行更多的有效性检查，例如

输出不应存在（如果您不想允许覆盖）
如果要允许覆盖，如果输出存在，则必须是普通文件
依旧......
最好有可能使用更多输入目录/文件作为参数

因此更好（一个更多的bash-ish）产生输出到标准输出，你可以在调用时将它重定向到文件，如

bash wordcounter.sh files or directories more the one to count words > to_some_file

e.g

bash worcounter.sh some_dir >result.txt
#or
bash wordcounter.sh file1.txt file2.txt .... fileN.txt > result2.txt
#or
bash wordcounter.sh dir1 file1 dir2 file2 >result2.txt

整个wordcounter.sh可能是下一个：

for arg
do
    find "$arg" -type f -print0
done |xargs -0 grep -hoP '\b[[:alpha:]]+\b' |sort |uniq -c |sort -nr

其中：

find将搜索普通文件的所有参数
并在生成的文件列表中运行计数脚本

脚本窗台有一些缺点，例如：我会尝试计算图像文件中的单词，也许在下一个问题in this serie中你会要求它;）

修改

如果你真的想要两个参数脚本，例如script where_to_search output（什么不是很像bash），将上面的脚本放入函数中，并做任何你想做的事，例如：

#!/bin/bash

wordcounter() {
    for arg
    do
        find "$arg" -type f -print0
    done |xargs -0 grep -hoP '\b[[:alpha:]]+\b' |sort |uniq -c |sort -nr
}

where="$1"
output="$2"
#do here the necessary checks
#...
#and run the function
wordcounter "$where" > "$output"
#end of script

Bash脚本用于存储数组中的文件列表，其中包含所有文件中每个单词的出现次数

1 个答案:

修改