Question

免责声明：

    1) English is my second language, so please forgive any grammatical horrors you may find. I am pretty confident you will be able to understand what I need despite these.
    2) I have found several examples in this site that address questions/problems similar to mine, though I was unfortunately not able to figure out the modifications that would need to be introduced to fit my needs.

“问题”：

我有一个如下所示的CSV文件：

c1,c2,c3,c4,c5,134.6,,c8,c9,SERVER1,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER1,c11
c1,c2,c3,c4,c5,0.18,,c8,c9,SERVER2,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER2,c11
c1,c2,c3,c4,c5,416.09,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,0,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,12.1,,c8,c9,SERVER3,c11
c1,c2,c3,c4,c5,480.64,,c8,c9,SERVER4,c11
c1,c2,c3,c4,c5,,83.65,c8,c9,SERVER5,c11
c1,c2,c3,c4,c5,,253.15,c8,c9,SERVER6,c11
c1,c2,c3,c4,c5,,18.84,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,8.12,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,22.45,c8,c9,SERVER7,c11
c1,c2,c3,c4,c5,,117.81,c8,c9,SERVER8,c11
c1,c2,c3,c4,c5,,96.34,c8,c9,SERVER9,c11

补充事实：

    1) File has 11 columns.
    2) The data in columns 1, 2, 3, 4, 5, 8, 9 and 11 is irrelevant in this case. In other words, I will only work with columns 6, 7 and 10.
    3) Column 10 will be typically alphanumeric strings (server names), though it may contain also "-" and/or "_".
    4) Columns 6 and 7 will have exclusively numbers, with up to two decimal places (A possible value is 0). Only one of the two will have data per line, never both.

我需要的是输出：

    - A single occurrence of every string in column 10 (as column 1), then the sum (subtotal) of it's values in column 6 (as column 2) and last, the sum (subtotal) of it's values in column 7 (as column 3).
    - If the total for a field is "0" the field must be left empty, but still must exist (it's respective comma has to be printed).
    - **Note** that the strings in column 10 will be already alphabetically sorted, so there is no need to do that part of the processing with AWK.

输出样本，使用上面的示例作为输入：

SERVER1,134.6,,
SERVER2,0.18,,
SERVER3,428.19,,
SERVER4,480.64,,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,26.96

我已经在这些页面中找到了不是一个，而是两个AWK oneliners，它们可以完成它所需的一切：

awk -F "," 'NR==1{last=$10; sum=0;}{if (last != $10) {print last "," sum; last=$10; sum=0;} sum += $6;}END{print last "," sum;}' inputfile


awk -F, '{a[$10]+=$6;}END{for(i in a)print i","a[i];}' inputfile

两种情况下我的“问题”都是一样的：

    - Subtotals of 0 are printed.
    - I can only handle the sum of one column at a time. Whenever I try to add the second one, I get either a syntax error or it does simply not print the third column at all.

提前感谢您的支持人员！问候，马丁

Answer 1

这样的事情？

$ awk 'BEGIN{FS=OFS=","} 
            {s6[$10]+=$6; s7[$10]+=$7} 
         END{for(k in s6) print k,(s6[k]?s6[k]:""),(s7[k]?s7[k]:"")}' file | sort

SERVER1,134.6,
SERVER2,0.18,
SERVER3,428.19,
SERVER4,480.64,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,49.41
SERVER8,,117.81
SERVER9,,96.34

请注意，您对逗号的处理不一致，当最后一个字段为零时，您需要添加额外的一个（计算逗号）

Answer 2

您发布的预期输出似乎与您发布的示例输入不匹配，因此我们猜测但这可能是您正在寻找的内容：

$ cat tst.awk
BEGIN { FS=OFS="," }
$10 != prev {
    if (NR > 1) {
        print prev, sum6, sum7
    }
    sum6 = sum7 = ""
    prev = $10
}
$6  { sum6 += $6 }
$7  { sum7 += $7 }
END { print prev, sum6, sum7 }

$ awk -f tst.awk file
SERVER1,134.6,
SERVER2,0.18,
SERVER3,428.19,
SERVER4,480.64,
SERVER5,,83.65
SERVER6,,253.15
SERVER7,,49.41
SERVER8,,117.81
SERVER9,,96.34

在CSV文件中，使用KSH中的AWK，基于第三个列小计2列

2 个答案: