Question

我是熊猫的新手。我有几个dfs。列0中的数据是ID，列1-10中的数据是概率。我想在1-10上采用列dfs的列级平均值。行可能不是相同的顺序。

有没有比在ID上对每个df进行排序然后使用add / divide df函数更好的方法呢？任何帮助表示赞赏。

非常感谢您的评论。为了澄清，我需要平均2 dfs 元素。即（只显示每个df的1行）：

Df1:      id132456, 1,   2, 3, 4
Df2:      id132456, 2,   2, 3, 2
Averaged: id132456, 1.5, 2, 3, 3

Answer 1

看起来需要concat和mean：

import pandas as pd

df1 = pd.DataFrame({0:[14254,25445,34555],
                   1:[1,2,3],
                   2:[1,1,1],
                   3:[1,2,0]})

print (df1)
       0  1  2  3
0  14254  1  1  1
1  25445  2  1  2
2  34555  3  1  0

df2 = pd.DataFrame({0:[14254,25445,34555],
                    2:[1,0,0],
                    1:[1,0,1],
                    3:[1,2,0]})

print (df2)
       0  1  2  3
0  14254  1  1  1
1  25445  0  0  2
2  34555  1  0  0

#list of all DataFrames
dfs = [df1, df2]
print (pd.concat(dfs, ignore_index=True))
       0  1  2  3
0  14254  1  1  1
1  25445  2  1  2
2  34555  3  1  0
3  14254  1  1  1
4  25445  0  0  2
5  34555  1  0  0

#select all columns without first
print (pd.concat(dfs, ignore_index=True).ix[:,1:])
   1  2  3
0  1  1  1
1  2  1  2
2  3  1  0
3  1  1  1
4  0  0  2
5  1  0  0

我不确定需要什么样的意思，所以我加上两个：

#mean per rows
print (pd.concat(dfs, ignore_index=True).ix[:,1:].mean(1))
0    1.000000
1    1.666667
2    1.333333
3    1.000000
4    0.666667
5    0.333333
dtype: float64

#mean per columns
print (pd.concat(dfs, ignore_index=True).ix[:,1:].mean())
1    1.333333
2    0.666667
3    1.000000
dtype: float64

也许你还需要别的东西：

dfs = [df1.set_index(0), df2.set_index(0)]
print (pd.concat(dfs, ignore_index=True, axis=1))
       0  1  2  3  4  5
0                      
14254  1  1  1  1  1  1
25445  2  1  2  0  0  2
34555  3  1  0  1  0  0

print (pd.concat(dfs, ignore_index=True, axis=1).mean(1))
0
14254    1.000000
25445    1.166667
34555    0.833333
dtype: float64

print (pd.concat(dfs, ignore_index=True, axis=1).mean())
0    2.000000
1    1.000000
2    1.000000
3    0.666667
4    0.333333
5    1.000000
dtype: float64

平均来自某些列的pandas数据帧

1 个答案: