我有100个格式相似的CSV文件,并且它们只有两个值mean和std:
file1.csv
mean 0.21
std 0.54
我需要从每个CSV文件中提取每个均值和标准差,并计算总均值,例如:(mean [mean1,mean2,..])和(mean [std1,std2,..])。很难手动逐个手动地复制每个文件的均值和标准差,然后计算所有均值。
答案 0 :(得分:1)
假设文件名在your_files
中:
means, deviations = [], []
for file_name in your_files:
with open(file_name) as f:
lines = (float(line.split()[1]) for line in f)
means.append(next(lines))
deviations.append(next(lines))
然后您可以使用普通公式计算平均值。
答案 1 :(得分:1)
我将其称为“穴居人”方法,但它应该可以工作:
import os
means = []
stds = []
for file in os.listdir():
if not file.startswith('file'):
continue
mean, std = [float(l.split()[1]) for l in open(file).readlines()]
means.append(mean)
stds.append(std)
print('mean mean', sum(means)/len(means))
print('mean stds', sum(stds)/len(stds))
测试:
$ echo "mean 0.21
> std 0.54" > file1.csv
$ echo "mean 0.23
> std 0.56" > file2.csv
$ python -q
>>> import os
>>> means = []
>>> stds = []
>>> for file in os.listdir():
... if not file.startswith('file'):
... continue
... mean, std = [float(l.split()[1]) for l in open(file).readlines()]
... means.append(mean)
... stds.append(std)
...
>>> print('mean mean', sum(means)/len(means))
mean mean 0.22
>>> print('mean stds', sum(stds)/len(stds))
mean stds 0.55
答案 2 :(得分:1)
如果file1.csv
至file100.csv
都在同一目录中,则可以使用以下Python脚本:
#!/usr/bin/env python3
N = 100
mean_sum = 0
std_sum = 0
for i in range(1, N + 1):
with open(f"file{i}.csv") as f:
mean_sum += float(f.readline().split(",")[1])
std_sum += float(f.readline().split(",")[1])
print(f"Mean of means: {mean_sum / N}")
print(f"Mean of stds: {std_sum / N}")
这是假定它们实际上已格式化为CSV文件,并带有逗号分隔符。如果您的代码段中的字段只是用空格隔开,则使用.split()
而不是.split(",")
。