我有3个df。
DF1
id val1 val2
1 1.1 2.2
2 3.3 6.6
DF2
id val1 val2
1 5.1 2.2
3 3.3 6.6
4 2.1 5.2
DF3
id val1 val2
1 9.1 3.2
4 8.1 3.2
5 1.3 4.5
您可以注意到相同的id
= 1,3,4,不同数据框中的val1 & val2
值不同。
我正在寻找的是这样的多次出现的最终df,其中每个df的值为一列:
id df1 df2 df3
1 [1.1,2.2] [5.1,2.2] [9.1,3.2]
4 [2.1,5.2] [8.1,3.2] NA
正在考虑:
df.groupby(['id']).apply(list)
大熊猫有可能吗?
答案 0 :(得分:2)
使用:
#list of all DataFrames
dfs = [df1, df2, df3]
#loop for set index and Series by constructor
L = []
for x in dfs:
x = x.set_index('id')
L.append(pd.Series(x.values.tolist(), index=x.index))
#all together
df = pd.concat(L, axis=1, keys=('df1','df2','df3'))
print (df)
df1 df2 df3
id
1 [1.1, 2.2] [5.1, 2.2] [9.1, 3.2]
2 [3.3, 6.6] NaN NaN
3 NaN [3.3, 6.6] NaN
4 NaN [2.1, 5.2] [8.1, 3.2]
5 NaN NaN [1.3, 4.5]
#filter rows
df = df[df.count(axis=1) > 1]
print (df)
df1 df2 df3
id
1 [1.1, 2.2] [5.1, 2.2] [9.1, 3.2]
4 NaN [2.1, 5.2] [8.1, 3.2]
感谢您Arthur Gouveia使用dropna
:
df = df.dropna(thresh=2)
print (df)
df1 df2 df3
id
1 [1.1, 2.2] [5.1, 2.2] [9.1, 3.2]
4 NaN [2.1, 5.2] [8.1, 3.2]
编辑:
如果id
列解决方案中的唯一值不简化:
print (df3)
id val1 val2
0 1 9.1 3.2
1 4 8.1 3.2
2 1 1.3 4.5 <-change value to 1
dfs = [df1, df2, df3]
L = [x.groupby('id')['val1','val2'].apply(lambda x: x.values.ravel().tolist()) for x in dfs]
df = pd.concat(L, axis=1, keys=('df1','df2','df3'))
df = df[df.count(axis=1) > 1]
print (df)
df1 df2 df3
id
1 [1.1, 2.2] [5.1, 2.2] [9.1, 3.2, 1.3, 4.5]
4 NaN [2.1, 5.2] [8.1, 3.2]
答案 1 :(得分:0)
df1['df1'] = list(df1[['val1', 'val2']].values)
df2['df2'] = list(df2[['val1', 'val2']].values)
df3['df3'] = list(df3[['val1', 'val2']].values)
df_result = pd.merge( pd.merge(df1[['id', 'df1']], df2[['id', 'df2']], on = 'id', how = 'outer'), df3[['id', 'df3']], on = 'id', how = 'outer')