使用pandas创建平均数据框

时间:2017-04-15 04:46:04

标签: python excel pandas dataframe average

From country    Austria Belgium Denmark France  Germany Italy   Luxembourg  Switzerland The Netherlands United Kingdom
Austria 0   0   0   0   0   0   3   0   6   1
Belgium 0   0   0   2   1   1   0   0   5   1
Denmark 0   2   0   2   0   1   0   2   3   0
France  0   0   0   0   6   0   0   0   4   0
Germany 0   2   0   6   0   0   0   1   1   0
Italy   0   0   3   0   1   0   4   1   1   0
Luxembourg  0   0   0   4   0   1   0   1   3   1
Switzerland 0   1   0   0   0   0   0   0   7   2
The Netherlands 1   0   5   1   0   2   0   0   0   1
United Kingdom  2   0   2   2   0   2   1   0   1   0

这里我有一个表格,其中的值是从一个国家/地区分配到列上的国家/地区的点数。我总共有60个表,我正在尝试创建一个看起来相同的最终表,但值是所有60个表的平均值。我无法在pandas或堆栈交换中的其他地方找到任何函数来平均每个值,就像我正在尝试的那样,我该如何解决这个问题呢?

PS:在某些表格中有更多或更少的国家/地区

2 个答案:

答案 0 :(得分:2)

您可以先使用参数sheetname=None read_excel dict Dataframes df。然后按concat创建大indexmean再创建第二级dict_dfs = pd.read_excel('multiple_sheets.xlsx', sheetname=None) print (dict_dfs) {'sheetname1': a b 0 1 4 1 2 8, 'sheetname2': a b 0 7 1 1 5 0, 'sheetname3': a b 0 4 5} df = pd.concat(dict_dfs) print (df) a b sheetname1 0 1 4 1 2 8 sheetname2 0 7 1 1 5 0 sheetname3 0 4 5 df = df.groupby(level=1).mean() print (df) a b 0 4.0 3.333333 1 3.5 4.000000 并汇总dict_dfs = pd.read_excel('multiple_sheets.xlsx', sheetname=None, index_col=0) df = pd.concat(dict_dfs) df = df.groupby(level=1).mean() print (df) Austria Belgium Denmark France Germany Italy \ Fromcountry Austria 4 0 0 0 0 0 Belgium 0 0 0 2 1 1 Denmark 0 2 0 2 0 1 France 0 0 0 0 6 0 Germany 0 2 0 6 0 0 Italy 0 0 3 0 1 0 Luxembourg 0 0 0 4 0 1 Switzerland 0 1 0 0 0 0 The Netherlands 1 0 5 1 0 2 USA 3 4 0 0 0 0 United Kingdom 2 0 2 2 0 2 Luxembourg Switzerland The Netherlands USA United Kingdom Fromcountry Austria 3 0 6 4.0 1 Belgium 0 0 5 4.0 1 Denmark 0 2 3 5.0 0 France 0 0 4 0.0 0 Germany 0 1 1 0.0 0 Italy 4 1 1 0.0 0 Luxembourg 0 1 3 0.0 1 Switzerland 0 0 7 0.0 2 The Netherlands 0 0 0 0.0 1 USA 0 0 0 0.0 0 United Kingdom 1 0 1 0.0 0

index

编辑:

示例包含您的数据groupby

columns

如果有多个国家,则上次使用file按引用#reference sheetname - sheetname1 idx = dict_dfs['sheetname1'].index cols = dict_dfs['sheetname1'].columns df = df.reindex(index=idx, columns=cols) print (df) Austria Belgium Denmark France Germany Italy \ Fromcountry Austria 4 0 0 0 0 0 Belgium 0 0 0 2 1 1 Denmark 0 2 0 2 0 1 France 0 0 0 0 6 0 Germany 0 2 0 6 0 0 Italy 0 0 3 0 1 0 Luxembourg 0 0 0 4 0 1 Switzerland 0 1 0 0 0 0 The Netherlands 1 0 5 1 0 2 United Kingdom 2 0 2 2 0 2 Luxembourg Switzerland The Netherlands United Kingdom Fromcountry Austria 3 0 6 1 Belgium 0 0 5 1 Denmark 0 2 3 0 France 0 0 4 0 Germany 0 1 1 0 Italy 4 1 1 0 Luxembourg 0 1 3 1 Switzerland 0 0 7 2 The Netherlands 0 0 0 1 United Kingdom 1 0 1 0 和{{1}}名称进行过滤:

{{1}}

答案 1 :(得分:2)

假设我们有一个数据框列表tables

tables = [df.set_index('From country').copy() for _ in range(10)]

我们将索引设置为'From country'只是因为它已经不是索引。如果已经存在,则跳过该部分。

然后我们将数据帧列表转换为pd.Panel并取零轴上的均值

pd.Panel(dict(enumerate(tables))).mean(0)

如果tables已经是字典,那么我们只需将其直接传递给pd.Panel

pd.Panel(tables).mean(0)

enter image description here