Question

我有很多csv文件，我需要计算特定每个特定单元格的平均值。下面是这些csv文件的简化示例。在我的实际文件中有多个值字段，但为了简单起见，我只给出一个例子。

File0.csv：

Latitude, Longitude, Value 23, 97, 1 24, 97, 5 25, 97, 6 26, 97, 4

File1.csv：

Latitude, Longitude, Value 23, 97, 7 24, 97, 4 25, 97, 2 26, 97, 9

每个文件都有相同的纬度/经度和相同的行数/列数，我只需创建一个新的csv，其中包含每个纬度/经度的平均值。

所需输出csv的一个示例：

Latitude, Longitude, Value 23, 97, 4 24, 97, 4.5 25, 97, 4 26, 97, 6.5

另一个注意事项：我的csv文件中有一些NoData值（给定值为-999.9），这可能会产生一些平均问题。

Answer 1

如果您对图书馆PySpark或Pandas有经验，可以使用他们的read_csv和groupby方法。否则，另一个选项是使用open()打开文件并使用Python IO读取它，并手动添加到2维列表，同时跟踪每个列表中的元素数量。例如，

values = []
for line in file:
    values.append([line.split(','), 1])

counter = 0

for file2 in files:
    for line in file2:
        if counter > values.length:
            values.append([line.split(','), values[counter][1] + 1])
        else:
            values[counter] = [map(lambda x, y: x + y, values[counter][0], line.split(',')), values[counter][1] + 1]

values[i] = [[j / values[i][1] for j in values[i][0]] for i in range(len(values))]

整体前提相当简单;代码有点凌乱。如果您要经常以这种方式操作数据，我建议您使用PySpark或Pandas。在纯Python中可能需要20行代码的东西在这些库中只需要2-3行代码。

Answer 2

阅读CSV文件：

def read_CSV_to_matrix(filename):
    matrix = []
    with open(str(filename)) as f:
        for line in f:
            matrix.append(int(line.strip().split(",")))
    return matrix

然后你遍历两个矩阵：

for index, line in matrix1:
    # for each line in F0.csv, print element 1, 2 and the average
    # of element 3 in the both matrices
    print(line[0], line[1], (line[2]+matrix2[index][2])/2)

免责声明：我是一个菜鸟（只有2年xp和Python 3）

在Python中计算多个.csv文件的单元平均值

2 个答案: