Question

我有一个脚本，可以搜索目录中的所有文件，并在单词<Overall>旁边添加数字。我现在想从每个文件中获取数字的平均值，然后将平均值旁边的文件名输出到两位小数。除了显示平均值以外，我大部分都可以使用。我应该说我认为它是可行的，我不确定是否要提取文件中的所有实例，并且我绝对不确定是否要找到平均值，如果没有精度，很难说得出。最后我还要按平均值排序。我正在尝试使用awk和bc来获取平均值，可能有更好的方法。

我现在所拥有的：

path="/home/Downloads/scores/*"

(for i in $path
do
    echo `basename $i .dat` `grep '<Overall>' < $i |
    head -c 10 | tail -c 1 | awk '{total += $1} END {print total/NR}' | bc`
done) | sort -g -k 2

我得到的输出是：

John 4
Lucy 4
Matt 5
Sara 5

但是它不能是整数，应该到小数点后两位。

此外，我正在搜索的文件如下所示：

<Student>John
<Math>2
<English>3
<Overall>5

<Student>Richard
<Math>2
<English>2
<Overall>4

Answer 1

通常，您的脚本不会从每个文件中提取所有数字，而只会提取第一个数字的第一位。考虑以下文件：

<Overall>123 ...
<Overall>4 <Overall>56 ...
<Overall>7.89 ...
<Overall> 0 ...

命令grep '<Overall>' | head -c 10 | tail -c 1仅提取1。

要提取所有以<Overall>开头的数字，可以使用grep -Eo '<Overall> *[0-9.]*' | grep -o '[0-9.]*'或（取决于您的版本）grep -Po '<Overall>\s*\K[0-9.]*'。

要计算这些数字的平均值，可以使用awk命令或... | average（来自软件包num-utils）或... | datamash mean 1之类的专用工具。

要打印具有两位小数的数字（即1.00代替1和2.35代替2.34567），可以使用printf。

#! /bin/bash
path=/home/Downloads/scores/
for i in "$path"/*; do
    avg=$(grep -Eo '<Overall> *[0-9.]*' "$file" | grep -o '[0-9.]*' |
          awk '{total += $1} END {print total/NR}')
    printf '%s %.2f\n' "$(basename "$i" .dat)" "$avg"
done |
sort -g -k 2

仅当文件名没有空格（例如空格，制表符，换行符）时，排序才有效。

请注意，您可以使用上述任何方法将avg=$(之后的两行换掉。

Answer 2

您可以使用sed命令并使用bc检索值以计算其平均值：

# Read the stdin, store the value in an array and perform a bc call
function avg() { mapfile -t l ; IFS=+ bc <<< "scale=2; (${l[*]})/${#l[@]}" ; }

# Browse the .dat files, then display for each file the average
find . -iname "*.dat" |
  while read f
  do
    f=${f##*/} # Remove the dirname
    # Echoes the file basename and a tabulation (no newline)
    echo -en "${f%.dat}\t"
    # Retrieves all the "Overall" values and passes them to our avg function
    sed -E -e 's/<Overall>([0-9]+)/\1/' "$f" | avg
  done

输出示例：

score-2 1.33
score-3 1.33
score-4 1.66
score-5 .66

Answer 3

管道head -c 10 | tail -c 1 | awk '{total += $1} END {print total/NR}' | bc需要改进。

head -c 10 | tail -c 1仅在每个文件的第一总体行中保留第十个字符；最好丢掉它。
相反，使用awk来“删除”前缀<Overall>并提取数字；我们可以通过使用<Overall>作为输入字段分隔符来实现此目的。
还可以使用awk将结果格式化为两位小数。
由于awk已经完成工作，因此不再需要bc；放下它。

以上管道变为awk -F'<Overall>' '{total += $2} END {printf "%.2f\n", total/NR}'。
不要错过`。

获取每个文件中找到的数字的平均值到两个小数位

3 个答案: