更好的解决方案而不是循环文件? (提高性能)

时间:2018-03-27 16:40:25

标签: bash

我编写了一个脚本,填充文件如下:

nodetool="/var/opt/app/cassandra/apache-cassandra-3.11.1/bin/nodetool"

# I need to get some statistics from Cassandra using nodetool
$nodetool tablestats > tmp

# I don't need all statistic, just following info
cat tmp | grep 'Keyspace\|Pending Flushes\|Table:\|SSTable count:\|Pending flushes:\|last five minutes' > tablestats

# I needed empty line to separate between each table info
sed -i '/Maximum tombstones/a\ \n' tablestats

创建一个tablestats并将其填充如下

    Keyspace : myKeyspace
      Pending Flushes: 0
        Table: test_table_1
        SSTable count: 0
        Pending flushes: 0
        Average live cells per slice (last five minutes): NaN
        Maximum live cells per slice (last five minutes): 0
        Average tombstones per slice (last five minutes): NaN
        Maximum tombstones per slice (last five minutes): NaN

       Table: student_table
       SSTable count: 4
       Pending flushes: 2
       Average live cells per slice (last five minutes): 2
       Maximum live cells per slice (last five minutes): 5
       Average tombstones per slice (last five minutes): NaN
       Maximum tombstones per slice (last five minutes): 9

       Table: sales_table
       SSTable count: 7
       Pending flushes: 3
       Average live cells per slice (last five minutes): 3
       Maximum live cells per slice (last five minutes): 8
       Average tombstones per slice (last five minutes): 6
       Maximum tombstones per slice (last five minutes): 12

    ...

我需要为每个表计算以下值,并将其插入每个表统计信息的末尾

1-最大墓碑/最大活细胞

2-平均墓碑/平均活细胞

我写了以下脚本来完成这项工作

average_live_cells=0
maximum_live_cells=0
average_tombstones=0
maximum_tombstones=0

touch newFile

while read line; do

if [[ $line = *"Average live cells"* ]]; then
    average_live_cells=$(echo $line| cut -d':' -f 2 | xargs)

elif [[ $line = *"Maximum live cells"* ]]; then
    maximum_live_cells=$(echo $line| cut -d':' -f 2 | xargs)


elif [[ $line = *"Average tombstones"* ]]; then
    average_tombstones=$(echo $line| cut -d':' -f 2 | xargs)

elif [[ $line = *"Maximum tombstones"* ]]; then
    maximum_tombstones=$(echo $line| cut -d':' -f 2 | xargs)
fi

if [[ ! $line = *[!\ ]* ]]; then

    if [[ $maximum_live_cells -eq "NaN" || $maximum_tombstones -eq "NaN" ]] ; then
        calculated_max="NaN"
    else
        calculated_max=$(echo "scale=2 ; $maximum_tombstones / $maximum_live_cells" | bc)
    fi

    if [[ $average_live_cells -eq "NaN" || $average_tombstones -eq "NaN" ]] ; then
        calculated_ave="NaN"
    else
        calculated_ave=$(echo "scale=2 ; $average_tombstones / $average_live_cells" | bc)
    fi

    echo "average_tombstones/average_live_cells: $calculated_ave" >> newFile
    echo -e "maximum_tombstones/maximum live_cells: $calculated_max\n" >> newFile
    average_live_cells=0
    maximum_live_cells=0
    average_tombstones=0
    maximum_tombstones=0
else
    echo $line >> newFile
fi

done < tablestats

以上脚本创建如下文件

Keyspace : myKeyspace
Pending Flushes: 0
Table: test_table_1
SSTable count: 0
Pending flushes: 0
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): NaN
average_tombstones/average_live_cells: NaN
maximum_tombstones/maximum live_cells: NaN

Table: student_table
SSTable count: 4
Pending flushes: 2
Average live cells per slice (last five minutes): 2
Maximum live cells per slice (last five minutes): 5
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 9
average_tombstones/average_live_cells: NaN
maximum_tombstones/maximum live_cells: 1.80

Table: sales_table
SSTable count: 7
Pending flushes: 3
Average live cells per slice (last five minutes): 3
Maximum live cells per slice (last five minutes): 8
Average tombstones per slice (last five minutes): 6
Maximum tombstones per slice (last five minutes): 12
average_tombstones/average_live_cells: 2.00
maximum_tombstones/maximum live_cells: 1.50

...

但我认为这应该是一个更好的解决方案,而不是循环浏览文件,你有更好的解决方案而不是循环吗?

1 个答案:

答案 0 :(得分:0)

perl有一个很好的模式,你可以一次读取一个文件:

perl -00 -lpe '
    /Average live.*: (\S+)/ and $al = $1;
    /Maximum live.*: (\S+)/ and $ml = $1;
    /Average tomb.*: (\S+)/ and $at = $1;
    /Maximum tomb.*: (\S+)/ and $mt = $1;
    $ra = ($al and $at and $al > 0 and $at > 0) ? $at/$al : "NaN";
    $rm = ($ml and $mt and $ml > 0 and $mt > 0) ? $mt/$ml : "NaN";
    $_ .= sprintf "\nratio avarage: %.2f", $ra;
    $_ .= sprintf "\nratio maximum: %.2f", $rm;
' tablestats > newFile