CSV文件bash脚本中的平均值

时间:2018-11-28 10:16:55

标签: linux bash shell unix scripting

我目前正在编写bash脚本,以找出服务器每小时的平均内存使用量,该脚本输出到.csv文件。将会发生的情况是,脚本将每10分钟运行一次,并且在一小时内运行六次后,我的.csv文件中的小时数将有6个不同的值,依此类推。

我想做的是使用脚本找出每个小时的平均值。

#date(YYYYMMDDHHmm) total     used
201811270000        10        3
201811270010        10        4
201811270020        10        5
201811270030        10        9
201811270040        10        8
201811270050        10        2
201811270100        10        5
201811270110        10        1
201811270120        10        7
201811270130        10        6
201811270140        10        5
201811270150        10        2
201811270200        10        1

根据上面的输出,有谁知道我可以找到每个小时的平均值的方法吗?例如:

The average of hour 201811270000: 5.166666666666667
The average of hour 201811270100: 4.333333333333333

我该怎么办?

有可能这样做吗?

5 个答案:

答案 0 :(得分:2)

尴尬

awk '
  function calc() {
    if (count) print "The average of hour " date ": " (sum/count);
    count=0; sum=0; date=$1;
  }
  /^#/ {next}             # throw away comment lines
  $1~/00$/ {calc()}       # full hour, time to calculate/reset variables
  END {calc()}            # end of file, ditto
  {count+=1; sum+=$3;}    # update variables at each line
' < file.txt

纯bash十分繁琐,因为您需要首先实现浮点运算库。 :)

答案 1 :(得分:0)

我将使用“ tr”将线修剪成较小的,空格分隔的块,“ cut”将我们计算平均值所需的部分删去。如果格式变得更复杂,您可以随时增强getFieldAtPosition功能。

我在atm上没有完整的bash,因此我使用了一个数组进行迭代,而不是从文件输入中读取。 对于逐行读取文件的方法,您可以查看以下答案:

仅bash版本:

    function average {
       local sum=$1
       local count=$2
       local floatingPointUnits=2

       # https://linux.die.net/man/1/dc
       echo "${floatingPointUnits}k" "$sum" "$count" /p | dc
    }

   function getFieldAtPosition {
        local line=$1
        local position=$2

        echo "$line"  | tr -s ' ' | cut -d ' ' -f $position
    }

    function parseHourFromDate {
        local date=$1
        local positionOfHour=4+2+2
        local lengthOfHour=2

        echo ${date:positionOfHour:lengthOfHour}
    }

    lines=('201811270000        10        3      ' \
        '201810270020        7        2      ' \
        '201811270100        10        3      ' \
        '201810270140        22        2      ' \
        '201811271000        33        3      ' )

    sum=0
    count=0
    declare -A HOURS
    for line in "${lines[@]}"; do
        date=`getFieldAtPosition "$line" 1`
        number=`getFieldAtPosition "$line" 2`
        hour=`parseHourFromDate "$date"`

        # new hour, reset
        if [ "$hour" != "$previousHour" ]; then
           sum=0
           count=0
        fi

        sum=$((sum+number))
        count=$((count+1))

        # save average in associative array
        HOURS[$hour]=`average $sum $count`
        previousHour=$hour
    done


    # print results
    for key in "${!HOURS[@]}"; do
        echo "Average of $key: ${HOURS[$key]}"
    done

答案 2 :(得分:0)

使用Perl

> cat ivan.txt
201811270000        10        3
201811270010        10        4
201811270020        10        5
201811270030        10        9
201811270040        10        8
201811270050        10        2
201811270100        10        5
201811270110        10        1
201811270120        10        7
201811270130        10        6
201811270140        10        5
201811270150        10        2
201811270200        10        1
> perl -F'/\s+/'  -lane ' { $F[0]=~s/..$//g;push @{$datekv{$F[0]}},$F[2];} END { for my $x (sort keys %datekv){ $total=0;$z=0; foreach(@{$datekv{$x}}) {$total+=$_;$z++ } print $x,"\t",$total/$z }}' ivan.txt
2018112700      5.16666666666667
2018112701      4.33333333333333
2018112702      1
>

答案 3 :(得分:0)

使用bash和bc计算:

PROCESS_FILE="file.txt"
PROCESSED_DATE=""

while read -r line; do
        if [[ $line =~ ^# ]]; then
                 continue;
        fi

        LINE_DATE=${line:0:10}
        if [[ $PROCESSED_DATE != *"$LINE_DATE"* ]]; then
                PROCESSED_DATE+=","+$LINE_DATE
                USED_LIST=$(grep $LINE_DATE $PROCESS_FILE | sed 's/  */,/g' | cut -d ',' -f3 | tr '\n' ' ')
                COUNT=0;
                SUM=0;
                for USED in $USED_LIST; do
                        COUNT=$(echo "$COUNT + 1" | bc -l);
                        SUM=$(echo "$SUM + $USED" | bc -l);
                done

                if [ $COUNT -ne 0 ]; then
                        AVG=$(echo "$SUM/$COUNT" | bc -l)
                fi
                echo "The average of hour $LINE_DATE: $AVG"
        fi

done < $PROCESS_FILE

答案 4 :(得分:-1)

在bash中,这是一种简短的方法(有点野蛮):

class WebTestView(PageMixin, FormView):
    ....

执行结果如下:

calc() {
awk "BEGIN { print "$*" }";
}

IFS=$'\r\n' GLOBIGNORE='*' command eval  'memory=($(<'$1'))'
for (( i = 0; i < ${#memory[@]}; i++ )); do
echo "${memory[i]}" | awk '{print $1" "$3}' >> values.txt
total=$(awk '{ (Values += $2) } END { printf "%0.0f", Values }' values.txt)
length=$(awk '{print $2}' values.txt | wc -l)
echo "The average of hour $(awk '{print $1}' values.txt | tail -n1): $(calc ${total}/${length})"
done
rm values.txt

您以后可以更改输出以将其转发到文件。 对于经验丰富的bash用户,还有更优雅的方法。

对于Paul Hodges:

Awk指向有问题的特定列,因为我们不知道该列的长度是否与文件的其余部分相同(仍然适用)。

tr -d是必需的,因为变量的值必须是整数而不是字符串(仅在命令行):

这是一个字符串:

ivo@spain-nuc-03:~/Downloads/TestStackoverflow$ ./processing.sh test.csv 
The average of hour 201811270000: 3
The average of hour 201811270010: 3.5
The average of hour 201811270020: 4
The average of hour 201811270030: 5.25
The average of hour 201811270040: 5.8
The average of hour 201811270050: 5.16667
The average of hour 201811270100: 5.14286
The average of hour 201811270110: 4.625
The average of hour 201811270120: 4.88889
The average of hour 201811270130: 5
The average of hour 201811270140: 5
The average of hour 201811270150: 4.75
The average of hour 201811270200: 4.46154
ivo@spain-nuc-03:~/Downloads/TestStackoverflow$

这是整数:

ivo@spain-nuc-03:~/Downloads/ScriptsClientes/BashReports/Tools/TextProcessing$ cat values.txt | wc -l
13
ivo@spain-nuc-03:~/Downloads/ScriptsClientes/BashReports/Tools/TextProcessing$

另外,仅执行wc -l文件将返回以下内容(仍然适用):

ivo@spain-nuc-03:~/Downloads/ScriptsClientes/BashReports/Tools/TextProcessing$ cat values.txt | wc -l | tr -d '\n'
13ivo@spain-nuc-03:

根本不适合手头的任务,因为它会迫使您过滤掉文件名。

请确保在批评之前。