Python - 在计算之前从文件中过滤行

时间:2017-08-28 10:16:38

标签: python python-2.7 file average nan

下一代码计算输入文件中每列的平均值。它一直有效,直到文件的nan值变为平均值。

这是我的代码:

with open(biasfile, 'r') as f:
    data = [map(float, line.split()) for line in f]

num_rows = len(data)
num_cols = len(data[0])

totals = num_cols * [0.0]

for line in data:
    for index in xrange(num_cols):
        totals[index] += line[index]

averages = [total / num_rows for total in totals]
print averages

这是文件的一部分:

 22.7061 5.4303
 32.2040 5.4364
 22.9982 5.4426
 nan 5.4487
 nan 5.4548
 nan 5.4610

这是输出:

[nan, 3.1446607421875]

我想忽略nan个值并计算其余值的平均值。我怎么能这样做?

2 个答案:

答案 0 :(得分:1)

您可以使用Python列表推导来过滤数据:

with open('file.txt') as file:
    data = [line.split() for line in file]
    data = [item for item in data if 'nan' not in item]
    data = [map(float, item) for item in data]

totals = len(data[0]) * [0.0]

for item in data:
    for k, n in enumerate(item):
        totals[k] += n

print([total / len(data) for total in totals])

另一种方法:

with open('file.txt') as file:
    data = [line.split() for line in file]
    data = [item for item in data if 'nan' not in item]
    data = [map(float, item) for item in data]

print([sum(d[k] for d in data) / len(data) for k in range(len(data[0]))])

答案 1 :(得分:0)

您是否可以使用DataFrame API并执行以下操作:

dataFrame.map(x => if (!x.isNaN) x).avg