Question

假设我有两个数据帧：

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'person':[1,1,2,2,3], 'sub_id':[20,21,21,21,21], 'otherval':[np.nan, np.nan, np.nan, np.nan, np.nan], 'other_stuff':[1,1,1,1,1]}, columns=['person','sub_id','otherval','other_stuff'])

df2 = pd.DataFrame({'sub_id':[20,21,22,23,24,25], 'otherval':[8,9,10,11,12,13]})

我希望person中df1的每个级别都包含sub_id的所有级别（包括任何重复项）以及来自otherval的{{1}}。换句话说，我的合并结果应如下所示：

df2

请注意person sub_id otherval other_stuff 1 20 8 1 1 21 9 NaN 1 22 10 NaN 1 23 11 Nan 1 24 12 NaN 1 25 13 NaN 2 20 8 NaN 2 21 9 1 2 21 9 1 2 22 10 NaN 2 23 11 NaN 2 24 12 NaN 2 25 13 NaN 3 20 8 NaN 3 21 9 1 3 22 10 NaN 3 23 11 NaN 3 24 12 NaN 3 25 13 NaN如何两个行person==2。

Answer 1

您可以通过以下方式获得所需的输出：

df3 = df1.groupby('person').apply(lambda x: pd.merge(x,df2, on='sub_id', how='right')).reset_index(level = (0,1), drop = True)
df3.person = df3.person.ffill().astype(int)
print df3

那应该产生：

#     person  sub_id  otherval_x  other_stuff  otherval_y
# 0        1      20         NaN          1.0           8
# 1        1      21         NaN          1.0           9
# 2        1      22         NaN          NaN          10
# 3        1      23         NaN          NaN          11
# 4        1      24         NaN          NaN          12
# 5        1      25         NaN          NaN          13
# 6        2      21         NaN          1.0           9
# 7        2      21         NaN          1.0           9
# 8        2      20         NaN          NaN           8
# 9        2      22         NaN          NaN          10
# 10       2      23         NaN          NaN          11
# 11       2      24         NaN          NaN          12
# 12       2      25         NaN          NaN          13
# 13       3      21         NaN          1.0           9
# 14       3      20         NaN          NaN           8
# 15       3      22         NaN          NaN          10
# 16       3      23         NaN          NaN          11
# 17       3      24         NaN          NaN          12
# 18       3      25         NaN          NaN          13

我希望有所帮助。

获得两个Pandas DataFrame的每个组合？

1 个答案: