Question

DATE_ID TOT_OP_MAIN_PAID_MNT
20140201        0
20140201          -100
20140201        -9
20140201        0
20140201        0
20140201        0
20140201        0
20140201        -127.5
20140201        0
20140201        -126.4
20140201        0
20140202             23
20140202             -233
20140202             0
20140203             55
20140203             90
20140203            -13

我有如上所示的txt文件的百万条记录，每个txt文件在一天内都有记录，我想知道如何输出一个总和为TOT_OP_MAIN_PAID_MNT的每天以及所有文件末尾的总和：

以下是我期望输出结果的示例：

DATE_ID   TOTAL
20140201  -362.9
20140202  -210
20140203  132

          -440,9

我正在使用的awk代码：

awk -F, 'FNR == 5 {print $1} { sum += $6 } END { print sum }' CAS01.txt CAS02.txt CAS03.txt

但我得到的输出看起来像

20140201
20140202
20140228
-1.7445e+09

Answer 1

我会选择这样的事情：

awk 'NR==1 {next}
     {a[$1]+=$2}
     END {for (i in a) {print i, a[i]; tot+=a[i]} 
          print "TOTAL", tot}' file

对于您的给定输入，它返回：

20140201 -362.9
20140202 -210
20140203 132
TOTAL -440.9

解释

NR==1 {next}跳过第一行。
{a[$1]+=$2}，保留一个包含a[day]=value。
END {}完成后，打印结果。
for (i in a) {print i, a[i]; tot+=a[i]}每天打印总计，并为所有值保留一个计数器。
print "TOTAL", tot打印总计数。

如果您想保留标题，可以将其存储为NR==1：

$ awk 'NR==1 {header=$0; next} {a[$1]+=$2} END {print header; for (i in a) {print i, a[i]; tot+=a[i]} print "TOTAL", tot}' a | column -t
DATE_ID   TOT_OP_MAIN_PAID_MNT
20140201  -362.9
20140202  -210
20140203  132
TOTAL     -440.9

Answer 2

这会对你有帮助。

#!/usr/bin/awk -f

BEGIN {
    print("DATE_ID", "TOTAL")
}

$1 ~ /[[:digit:]]/ {
    a[$1]+=$2;total+=$2
}

END{
    for(i in a) {
        print i,a[i]
    }
    print "\t"total
}

一个班轮。

$ awk 'BEGIN {print("DATE_ID", "TOTAL")} $1 ~ /[[:digit:]]/ {a[$1]+=$2;total+=$2} END{for(i in a) {print i,a[i]}; print "\t"total}' file.txt

输出：

DATE_ID TOTAL
20140202 -210
20140203 132
20140201 -362.9
         -440.9

Answer 3

可能以下shell脚本是您需要的

#! /bin/bash
for file in $@
do
awk -F' ' 'BEGIN{
} {
    print $0
    for(i=1;i<=NF;i++)
        sum[i]+=$i
}
END {
    for(i=1;i<=NF;i++)
        printf("%.2lf ",sum[i]) 
    print("\n")
}' $file
done

将其保存在col_sum.sh中，然后在linux shell中运行sh col_sum.sh a.txt b.txt c.txt。

注意：当总和太大时，print将输出科学记数法。你需要的是Format-Control！

awk脚本在最终结果之前打印在结果之间

3 个答案:

解释