Question

我有一个包含1000行的表，如下所示： file1：

apples1 + hate 0 0 0 2 4 6 0 1 
apples2 + hate 0 2 0 4 4 6 0 2 
apples4 + hate 0 2 0 4 4 6 0 2

和另一个在file2中具有相同标头的文件 - 在file1中缺少一些标头：

apples1 + hate 0 0 0 1 4 6 0 2 
apples2 + hate 0 1 0 6 4 6 0 2
apples3 + hate 0 2 0 4 4 6 0 2 
apples4 + hate 0 1 0 3 4 3 0 1

我想比较pandas中的两个文件和普通列中的平均值。我不想打印只在一个文件中的列。所以生成的文件看起来像：

apples1 + hate 0 0 0 1.5 4 6 0 1.5 
apples2 + hate 0 1.5 0 5 4 6 0 2 
apples4 + hate 0 2 0 3.5 4 6 0 2

Answer 1

此解决方案有两个步骤。

通过使用pandas.concat(...)垂直堆叠它们（轴= 0，默认值）并指定＆＃39;内部＆＃39;的连接来连接所有数据帧。仅维护所有数据框中的列。
在结果数据框上调用mean(...)函数。

示例：

In [1]: df1 = pd.DataFrame([[1,2,3], [4,5,6]], columns=['a','b','c'])
In [2]: df2 = pd.DataFrame([[1,2],[3,4]], columns=['a','c'])
In [3]: df1
Out[3]:
   a  b  c
0  1  2  3
1  4  5  6

In [4]: df2
Out[4]:
   a  c
0  1  2
1  3  4

In [5]: df3 = pd.concat([df1, df2], join='inner')
In [6]: df3
Out[6]:
   a  c
0  1  3
1  4  6
0  1  2
1  3  4

In [7]: df3.mean()
Out[7]:
a    2.25
c    3.75
dtype: float64

Answer 2

我们试试这个：

df1 = pd.read_csv('file1', header=None)
df2 = pd.read_csv('file2', header=None)

将索引设置为前三列，即“apple1 + hate”

df1 = df1.set_index([0,1,2])
df2 = df2.set_index([0,1,2])

让我们使用merge对索引上的内部联接数据文件，以及具有相同名称且与mean聚合的groupby列：

df1.merge(df2, right_index=True, left_index=True)\
   .pipe(lambda x: x.groupby(x.columns.str.extract('(\w+)\_[xy]', expand=False),
                             axis=1, sort=False).mean()).reset_index()

输出：

         0  1     2    3    4    5    6    7    8    9   10
0  apples1  +  hate  0.0  0.0  0.0  1.5  4.0  6.0  0.0  1.5
1  apples2  +  hate  0.0  1.5  0.0  5.0  4.0  6.0  0.0  2.0
2  apples4  +  hate  0.0  1.5  0.0  3.5  4.0  4.5  0.0  1.5

如何在熊猫中找到两个表的平均值？

2 个答案: