Question

我需要知道如何汇总CSV文件中列的所有数字。

例如。我的数据如下：

column  count   min max sum mean
80  29573061    2   40  855179253   28.92
81  28861459    2   40  802912711   27.82
82  28165830    2   40  778234605   27.63
83  27479902    2   40  754170015   27.44
84  26800815    2   40  729443846   27.22
85  26127825    2   40  701704155   26.86
86  25473985    2   40  641663075   25.19
87  24827383    2   40  621981569   25.05
88  24189811    2   40  602566423   24.91
89  23566656    2   40  579432094   24.59
90  22975910    2   40  553092863   24.07
91  22412345    2   40  492993262   22
92  21864206    2   40  475135290   21.73
93  21377772    2   40  461532152   21.59
94  20968958    2   40  443921856   21.17
95  20593463    2   40  424887468   20.63
96  20329969    2   40  364319592   17.92
97  20157643    2   40  354989240   17.61
98  20104046    2   40  349594631   17.39
99  20103866    2   40  342152213   17.02
100 20103866    2   40  335379448   16.6
#But it's separated by tabs

到目前为止我写的代码是：

import sys
import csv

def ErrorCalculator(file):
        reader = csv.reader(open(file), dialect='excel-tab' )

        for row in reader:
                PxCount = 10**(-float(row[5])/10)*float(row[1])


if __name__ == '__main__':
        ErrorCalculator(sys.argv[1])

对于这个特殊代码，我需要将PxCount中的所有数字相加并除以行[1]中所有数字的总和...

如果告诉我如何将列数加起来或者如果你帮我解释这段代码，我将非常感激。

另外，如果你能给我一个提示，可以跳过标题。

Answer 1

您可以使用"augmented assignment" +=

保持总计

total=0
for row in reader:
        PxCount = 10**(-float(row[5])/10)*float(row[1])
        total+=PxCount

要跳过csv文件中的第一行（标题）：

with open(file) as f:
    next(f)  # read and throw away first line in f
    reader = csv.reader(f, dialect='excel-tab' )

Answer 2

您可以在实例化阅读器后立即调用“reader.next（）”以丢弃第一行。

要对PxCount求和，只需在循环前设置sum = 0，然后在计算每行后设置sum += PxCount。

PS您可能会发现csv.DictReader也很有帮助。

Answer 3

使用DictReader会产生更清晰的代码。 Decimal会给你更好的精确度。还尝试遵循python命名约定并对函数和变量使用小写名称。

import decimal

def calculate(file):
    reader = csv.DictReader(open(file), dialect='excel-tab' )
    total_count = 0
    total_sum = 0
    for row in reader:
        r_count = decimal.Decimal(row['count'])
        r_sum = decimal.Decimal(row['sum'])
        r_mean = decimal.Decimal(row['mean'])
        # not sure if the below formula is actually what you want
        total_count += 10 ** (-r_mean / 10) * r_count
        total_sum += r_sum
    return total_count / total_sum

如何在python中汇总一列的所有数字？

3 个答案: