Question

亲爱的，
我是Python的初学者。我正在寻找在Python中执行以下操作的最佳方法：假设我有三个文本文件，每个文件有m行和n列数字，名称文件A，B和C.对于以下内容，内容可以是索引为A[i][j]或B[k][l]等等。我需要计算A[0][0]，B[0][0]，C[0][0]的平均值，并将其写入D[0][0]的文件D.其余记录也是如此。例如，让我们假设：

因此，文件D应为

D:  
1     2.67   4    
2.33  3.33   4

我的实际文件当然比现在的文件大一些Mb的顺序。如果读取由filename索引的嵌套结构中的所有文件内容，或者尝试读取每个文件的每一行并计算均值，我不确定最佳解决方案。在阅读本手册之后，fileinput模块在这种情况下没有用，因为它没有“并行”读取行，正如我在这里需要的那样，但是它“连续地”读取行。任何指导或建议都受到高度赞赏。

Answer 1

看看numpy。它可以将三个文件读入三个数组（使用fromfile），计算平均值并将其导出到文本文件（使用tofile）。

import numpy as np


a = np.fromfile('A.csv', dtype=np.int)   
b = np.fromfile('B.csv', dtype=np.int)   
c = np.fromfile('C.csv', dtype=np.int)   

d = (a + b + c) / 3.0

d.tofile('D.csv')

“某些MB”的大小应该不是问题。

Answer 2

如果是文本文件，请尝试以下操作：

def readdat(data,sep=','):
    step1 = data.split('\n')
    step2 = []
    for index in step1:
        step2.append(float(index.split(sep)))
    return step2

def formatdat(data,sep=','):
    step1 = []
    for index in data:
        step1.append(sep.join(str(data)))
    return '\n'.join(step1)

然后使用这些函数将文本格式化为列表。

Answer 3

仅供参考，这里是你如何在没有numpy的情况下做同样的事情（不那么优雅，但更灵活）：

files = zip(open("A.dat"), open("B.dat"), open("C.dat"))
outfile = open("D.dat","w")
for rowgrp in files:     # e.g.("1 2 3\n", "0 1 3\n", "2 5 6\n")
    intsbyfile = [[int(a) for a in row.strip().split()] for row in rowgrp]
                         # [[1,2,3], [0,1,3], [2,5,6]]
    intgrps = zip(*intsbyfile) # [(1,0,2), (2,1,5), (3,3,6)]
    # use float() to ensure we get true division in Python 2.
    averages = [float(sum(intgrp))/len(intgrp) for intgrp in intgrps]
    outfile.write(" ".join(str(a) for a in averages) + "\n")

在Python 3中，zip只会在需要时读取文件。在Python 2中，如果它们太大而无法加载到内存中，请改用itertools.izip。

使用python计算来自多个文件的记录的平均值

3 个答案: