如何使用bash将输入分成六个条目的块?

时间:2014-01-06 10:06:48

标签: bash

这是我运行以输出data_tripwire.sh

的原始数据的脚本
#!/bin/sh

    LOG=/var/log/syslog-ng/svrs/sec2tes1

for count in 6 5 4 3 2 1 0
do
    MONTH=`date -d"$count month ago" +"%Y-%m"`

    CBS=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.41 |sort|uniq | wc -l`
    echo $CBS >> /home/secmgr/attmrms1/data_tripwire1.sh
done

for count in 6 5 4 3 2 1 0
do
    MONTH=`date -d"$count month ago" +"%Y-%m"`

    GFS=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.31 |sort|uniq | wc -l`
    echo $GFS >> /home/secmgr/attmrms1/data_tripwire1.sh
done

for count in 6 5 4 3 2 1 0
do
    MONTH=`date -d"$count month ago" +"%Y-%m"`

    HR1=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.10.1 |sort|uniq | wc -l `
    echo $HR1 >> /home/secmgr/attmrms1/data_tripwire1.sh
done


for count in 6 5 4 3 2 1 0
do
    MONTH=`date -d"$count month ago" +"%Y-%m"`

    HR2=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.21.12 |sort|uniq | wc -l`
    echo $HR2 >> /home/secmgr/attmrms1/data_tripwire1.sh
done

for count in 6 5 4 3 2 1 0
do
    MONTH=`date -d"$count month ago" +"%Y-%m"`

    PAYROLL=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.21.18 |sort|uniq | wc -l`
    echo $PAYROLL >> /home/secmgr/attmrms1/data_tripwire1.sh

done

for count in 6 5 4 3 2 1 0
do
    MONTH=`date -d"$count month ago" +"%Y-%m"`

    INCV=`bzcat $LOG/$MONTH*.log.bz2|grep 10.55.22.71 |sort|uniq | wc -l`
    echo $INCV >> /home/secmgr/attmrms1/data_tripwire1.sh
done

data_tripwire.sh

91
58
54
108
52
18
8
81
103
110
129
137
84
15
14
18
11
17
12
6
1
28
6
14
8
8
0
0
28
24
25
23
21
13
9
4
18
17
18
30
13
3

我想从上面的输出中完成前6个条目(91,58,54,108,52,18)。然后它将突破循环。之后它将继续接下来的6个条目。然后它将再次突破循环....

现在的问题是,它会在不中断循环的情况下读取所有42个数字。

这是表

的输出
Tripwire

Month   CBS     GFS      HR     HR         Payroll   INCV 
        cb2db1  gfs2db1 hr2web1 hrm2db1   hrm2db1a   incv2svr1 
2013-07 85      76      12      28        26          4 
2013-08 58      103     18      6         24         18 
2013-09 54      110     11      14        25         17 
2013-10 108     129     17      8         23         18 
2013-11 52      137     12      8         21         30 
2013-12 18      84      6       0         13         13 
2014-01 8       16      1       0         9           3

现在的问题是它从85 ... 3中读取了总共42个数字 我想制作一个从7月到1月为一台服务器运行的循环。然后它将进行下面已经完成的平均均值和标准差计算。 完成之后,它将为下一个服务器继续下一个6个数字的循环,并且它将像初始循环一样执行相同的操作。对于已经中断或继续的for循环或任何更简单的for循环,需要辅助。

这是我的标准偏差计算

count=0         # Number of data points; global.
SC=3            # Scale to be used by bc. three decimal places.
E_DATAFILE=90   # Data file error

## ----------------- Set data file ---------------------

if [ ! -z "$1" ]  # Specify filename as cmd-line arg?
then
  datafile="$1" #  ASCII text file,
else            #+ one (numerical) data point per line!
  datafile=/home/secmgr/attmrms1/data_tripwire1.sh
fi              #  See example data file, below.

if [ ! -e "$datafile" ]
then
  echo "\""$datafile"\" does not exist!"
  exit $E_DATAFILE
fi

计算平均值

arith_mean ()
{
  local rt=0         # Running total.
  local am=0         # Arithmetic mean.
  local ct=0         # Number of data points.

  while read value   # Read one data point at a time.
  do
    rt=$(echo "scale=$SC; $rt + $value" | bc)
    (( ct++ ))
  done

  am=$(echo "scale=$SC; $rt / $ct" | bc)

  echo $am; return $ct   # This function "returns" TWO values!
  #  Caution: This little trick will not work if $ct > 255!
  #  To handle a larger number of data points,
  #+ simply comment out the "return $ct" above.
} <"$datafile"   # Feed in data file.

sd ()
{
  mean1=$1  # Arithmetic mean (passed to function).
  n=$2      # How many data points.
  sum2=0    # Sum of squared differences ("variance").
  avg2=0    # Average of $sum2.

sdev=0    # Standard Deviation.

  while read value   # Read one line at a time.
  do
    diff=$(echo "scale=$SC; $mean1 - $value" | bc)
    # Difference between arith. mean and data point.
    dif2=$(echo "scale=$SC; $diff * $diff" | bc) # Squared.
    sum2=$(echo "scale=$SC; $sum2 + $dif2" | bc) # Sum of squares.
  done

    avg2=$(echo "scale=$SC; $sum2 / $n" | bc)  # Avg. of sum of squares.
    sdev=$(echo "scale=$SC; sqrt($avg2)" | bc) # Square root =
    echo $sdev                                 # Standard Deviation.

} <"$datafile"   # Rewinds data file.

显示输出

mean=$(arith_mean); count=$?   # Two returns from function!
std_dev=$(sd $mean $count)

echo
echo "<tr><th>Servers</th><th>"Number of data points in \"$datafile"\"</th> <th>Arithmetic mean (average)</th><th>Standard Deviation</th></tr>" >> $HTML
echo "<tr><td>cb2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>gfs2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hr2web1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hrm2db1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>hrm2db1a<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML
echo "<tr><td>incv21svr1<td>$count<td>$mean<td>$std_dev</tr>" >> $HTML

echo

我想将输入分成六个条目的块,每个条目的算术平均值和条目的sd为1..6,然后是条目7..12,然后是13..18等。

这是我想要的表格的输出。

Tripwire

Month   CBS     GFS      HR     HR         Payroll   INCV 
        cb2db1  gfs2db1 hr2web1 hrm2db1   hrm2db1a   incv2svr1 
2013-07 85      76      12      28        26          4 
2013-08 58      103     18      6         24         18 
2013-09 54      110     11      14        25         17 
2013-10 108     129     17      8         23         18 
2013-11 52      137     12      8         21         30 
2013-12 18      84      6       0         13         13 
2014-01 8       16      1       0         9           3
*Standard
deviation
(7mths)  31.172   35.559    5.248  8.935  5.799    8.580 
* Mean
(7mths) 54.428  94.285   11.142 9.142  20.285   14.714

3 个答案:

答案 0 :(得分:2)

paste - - - - - - < data_tripwire.sh | while read -a values; do
    # values is an array with 6 values
    # ${values[0]} .. ${values[5]}
    arith_mean "${values[@]}"
done

这意味着您必须重写您的功能,以便他们不使用read:更改

while read value

for value in "$@"

@Matt,是的,更改两个函数以迭代参数而不是从stdin读取。然后,您将数据文件(现在称为“data_tripwire1.sh”(数据的可怕文件扩展名,使用.txt或.dat))传递到paste以重新格式化数据,以便前6个值现在形成第一排。将行读入数组values(使用read -a values)并调用函数:

arith_mean () {
    local sum=$(IFS=+; echo "$*")
    echo "scale=$SC; ($sum)/$#" | bc
}
sd () {
    local mean=$1
    shift
    local sum2=0
    for i in "$@"; do
        sum2=$(echo "scale=$SC; $sum2 + ($mean-$i)^2" | bc)
    done
    echo "scale=$SC; sqrt($sum2/$#)"|bc
}

paste - - - - - - < data_tripwire1.sh | while read -a values; do
    mean=$(arith_mean "${values[@]}")
    sd=$(sd $mean "${values[@]}")
    echo "${values[@]} $mean $sd"
done | column -t
91  58  54   108  52   18   63.500  29.038
8   81  103  110  129  137  94.666  42.765
84  15  14   18   11   17   26.500  25.811
12  6   1    28   6    14   11.166  8.648
8   8   0    0    28   24   11.333  10.934
25  23  21   13   9    4    15.833  7.711
18  17  18   30   13   3    16.500  7.973

请注意,您无需从函数中返回奇特的值:您知道传入的点数。

答案 1 :(得分:0)

现在,这些函数只能读取数据文件中的6个项目。

arith_mean ()
{
  local rt=0         # Running total.
  local am=0         # Arithmetic mean.
  local ct=0         # Number of data points.

  while read value   # Read one data point at a time.
  do
    rt=$(echo "scale=$SC; $rt + $value" | bc)
    (( ct++ ))
  done

  am=$(echo "scale=$SC; $rt / $ct" | bc)

  echo $am; return $ct   # This function "returns" TWO values!
  #  Caution: This little trick will not work if $ct > 255!
  #  To handle a larger number of data points,
  #+ simply comment out the "return $ct" above.
} <(awk -v block=$i 'NR > (6* (block - 1)) && NR < (6 * block + 1) {print}' "$datafile")   # Feed in data file.

sd ()
{
  mean1=$1  # Arithmetic mean (passed to function).
  n=$2      # How many data points.
  sum2=0    # Sum of squared differences ("variance").
  avg2=0    # Average of $sum2.

sdev=0    # Standard Deviation.

  while read value   # Read one line at a time.
  do
    diff=$(echo "scale=$SC; $mean1 - $value" | bc)
    # Difference between arith. mean and data point.
    dif2=$(echo "scale=$SC; $diff * $diff" | bc) # Squared.
    sum2=$(echo "scale=$SC; $sum2 + $dif2" | bc) # Sum of squares.
  done

    avg2=$(echo "scale=$SC; $sum2 / $n" | bc)  # Avg. of sum of squares.
    sdev=$(echo "scale=$SC; sqrt($avg2)" | bc) # Square root =
    echo $sdev                                 # Standard Deviation.

} <(awk -v block=$i 'NR > (6 * (block - 1)) && NR < (6 * block + 1) {print}' "$datafile")   # Rewinds data file.

从main开始,您需要设置要读取的块。

for((i=1; i <= $(( $(wc -l $datafile | sed 's/[A-Za-z \/]*//g') / 6 )); i++))
do
    mean=$(arith_mean); count=$?   # Two returns from function!
    std_dev=$(sd $mean $count)
done

当然最好将wc -l移到循环外部以便更快地执行。但是你明白了。

由于篇幅原因,<(之间出现语法错误。它们之间不应该有空格。抱歉打字错误。

cat <(awk -F: '{print $1}' /etc/passwd)有效。

意外令牌附近的

cat < (awk -F: '{print $1}' /etc/passwd)语法错误`('

答案 2 :(得分:0)

根据格伦的回答,我提出这个问题,原始版本只需要很少的改动:

paste - - - - - - < data_tripwire.sh | while read -a values
  do 
    for value in "${values[@]}"
    do
      echo "$value"
    done | arith_mean
    for value in "${values[@]}"
    do
      echo "$value"
    done | sd
  done

您可以在交互式shell中直接键入(或复制并粘贴)此代码 。它应该开箱即用。当然,如果您打算经常使用它,这是不可行的,因此您可以将该代码放入文本文件中,生成该可执行文件并将该文本文件作为shell脚本调用。在这种情况下,您应该在该文件中添加#!/bin/bash作为第一行。

感谢Glenn Jackman使用paste - - - - - -,这是我所说的真正解决方案。