Question

我将以下文件作为我一直在做的模拟实验的重复：

generation, ratio_of_player_A, ratio_of_player_B, ratio_of_player_C

所以，数据类似于

0, 0.33, 0.33, 0.33

1, 0.40, 0.40, 0.20

2, 0.50, 0.40, 0.10

etc

现在，因为我运行它是倍数，我每个实验大约有1000个文件，给出了各种这样的数字。现在，我的问题是将它们全部平均为一组实验。

因此，我希望有一个文件包含每一代后的平均比率（平均多次重复，即文件）

需要平均的所有复制输出文件都是名称，如output1.csv，output2.csv，output3.csv ..... output1000.csv

如果有人可以帮我解决shell脚本或python脚本，我将不得不承担责任。

Answer 1

如果我理解得很好，请说你有两个文件：

$ cat file1
0, 0.33, 0.33, 0.33
1, 0.40, 0.40, 0.20
2, 0.50, 0.40, 0.10

$ cat file2
0, 0.99, 1, 0.02
1, 0.10, 0.90, 0.90
2, 0.30, 0.10, 0.30

你想要在两个文件的列之间做平均值。所以这是第一列的方法：

编辑：我找到了一种更好的方法，使用pd.concat：

all_files = pd.concat([file1,file2]) # you can easily put your 1000 files here
result = {}
for i in range(3): # 3 being number of generations
    result[i] = all_files[i::3].mean()
result_df = pd.DataFrame(result)
result_df
                       0     1     2
ratio_of_player_A  0.660  0.25  0.40
ratio_of_player_B  0.665  0.65  0.25
ratio_of_player_C  0.175  0.55  0.20

使用merge进行其他方式，但需要执行多次合并

import pandas as pd

In [1]: names = ["generation", "ratio_of_player_A", "ratio_of_player_B", "ratio_of_player_C"]
In [2]: file1 = pd.read_csv("file1", index_col=0, names=names)
In [3]: file2 = pd.read_csv("file2", index_col=0, names=names)
In [3]: file1
Out[3]:     
       ratio_of_player_A  ratio_of_player_B  ratio_of_player_C
generation                                                         
0                        0.33               0.33               0.33
1                        0.40               0.40               0.20
2                        0.50               0.40               0.10    

In [4]: file2
Out[4]: 
            ratio_of_player_A  ratio_of_player_B  ratio_of_player_C
generation                                                         
0                        0.99                1.0               0.02
1                        0.10                0.9               0.90
2                        0.30                0.1               0.30



In [5]: merged_file = file1.merge(file2, right_index=True, left_index=True, suffixes=["_1","_2"])
In [6]: merged_file.filter(regex="ratio_of_player_A_*").mean(axis=1)
Out[6]
generation
0             0.66
1             0.25
2             0.40
dtype: float64

或者这样（我猜想要快一点）：

merged_file.ix[:,::3].mean(axis=1) # player A

如果你有多个文件，你可以在应用mean（）方法之前递归合并。

如果我误解了这个问题，请告诉我们您对file1和file2的期望。

询问您是否有不明白的事情。

希望这有帮助！

Answer 2

您可以在一个数据帧中加载1000个实验中的每一个，对它们进行总计，然后计算平均值。

filepath = tkinter.filedialog.askopenfilenames(filetypes=[('CSV','*.csv')]) #select your files
for file in filepath:
    df = pd.read_csv(file, sep=';', decimal=',')
    dfs.append(df)

temp = dfs[0] #creates a temporary variable to store the df
for i in range(1,len(dfs)): #starts from 1 cause 0 is stored in temp
    temp = temp + dfs[i];
result = temp/len(dfs)

Answer 3

你的问题不是很清楚.. 如果我明白了..

>temp
for i in `ls *csv`
more "$i">>temp;

然后您将来自不同文件的所有数据放在一个大文件中。尝试加载sqlite数据库（1.创建表2.插入数据）之后你可以查询你的数据。从yourtablehavingtempdata等中选择sum（列）/ count（列）。试着看看sqlite，因为你的数据是tabular.sqlite在我看来会更合适。

Answer 4

以下内容应该有效：

from numpy import genfromtxt

files = ["file1", "file2", ...]

data = genfromtxt(files[0], delimiter=',')
for f in files[1:]:
    data += genfromtxt(f, delimiter=',')

data /= len(files)

如何计算多个csv文件的平均数？

4 个答案: