应用错误收集

直接从CSV文件计算统计数据

时间：2010-04-16 19:21:51

标签： bash csv

我有一个CSV格式的事务日志文件，我想用它来运行统计信息。该日志包含以下字段：

date:  Time/date stamp
salesperson:  The username of the person who closed the sale
promo:  sum total of items in the sale that were promotions.
amount:  grand total of the sale

我想得到以下统计数据：

salesperson:  The username of the salesperson being analyzed.
minAmount:  The smallest grand total of this salesperson's transaction.
avgAmount:  The mean grand total..
maxAmount:  The largest grand total..
minPromo:  The smallest promo amount by the salesperson.
avgPromo:  The mean promo amount...

我很想建立一个数据库结构，导入这个文件，编写SQL并提取统计数据。我不需要这些数据比这些数据更多。有没有更简单的方法？我希望一些bash脚本可以让这很容易。

3 个答案:

答案 0 :(得分：3)

TxtSushi这样做：

tssql -table trans transactions.csv \
'select
    salesperson,
    min(as_real(amount)) as minAmount,
    avg(as_real(amount)) as avgAmount,
    max(as_real(amount)) as maxAmount,
    min(as_real(promo)) as minPromo,
    avg(as_real(promo)) as avgPromo
from trans
group by salesperson'

我有一堆example scripts显示如何使用它。

编辑：修复语法

答案 1 :(得分：2)

还可以敲出一个awk脚本来完成它。它只是带有一些变量的CSV。

答案 2 :(得分：1)

您可以遍历CSV中的行并使用bash脚本变量来保存最小/最大金额。对于平均值，只需保持一个运行总计，然后除以总行数（不计算可能的标题）。

以下是在bash中处理CSV文件的useful snippets。

如果您的数据可能被引用（例如，因为某个字段包含逗号），则使用bash，sed等进行处理会变得更加复杂。