从相同项目pandas的不同数据框中获取值

时间:2017-10-09 11:08:44

标签: python pandas

我有3个df。

DF1

id   val1   val2      
1    1.1     2.2
2    3.3     6.6

DF2

id   val1   val2      
1    5.1     2.2
3    3.3     6.6
4    2.1     5.2

DF3

id   val1   val2      
1    9.1     3.2
4    8.1     3.2
5    1.3     4.5

您可以注意到相同的id = 1,3,4,不同数据框中的val1 & val2值不同。

我正在寻找的是这样的多次出现的最终df,其中每个df的值为一列:

id   df1         df2         df3
 1  [1.1,2.2]   [5.1,2.2]   [9.1,3.2]
 4  [2.1,5.2]   [8.1,3.2]    NA

正在考虑:

df.groupby(['id']).apply(list)

大熊猫有可能吗?

2 个答案:

答案 0 :(得分:2)

使用:

#list of all DataFrames
dfs = [df1, df2, df3]

#loop for set index and Series by constructor
L = []
for x in dfs:
    x = x.set_index('id')
    L.append(pd.Series(x.values.tolist(), index=x.index))

#all together   
df = pd.concat(L, axis=1, keys=('df1','df2','df3'))
print (df)
           df1         df2         df3
id                                    
1   [1.1, 2.2]  [5.1, 2.2]  [9.1, 3.2]
2   [3.3, 6.6]         NaN         NaN
3          NaN  [3.3, 6.6]         NaN
4          NaN  [2.1, 5.2]  [8.1, 3.2]
5          NaN         NaN  [1.3, 4.5]

#filter rows
df = df[df.count(axis=1) > 1]
print (df)
           df1         df2         df3
id                                    
1   [1.1, 2.2]  [5.1, 2.2]  [9.1, 3.2]
4          NaN  [2.1, 5.2]  [8.1, 3.2]

感谢您Arthur Gouveia使用dropna

df = df.dropna(thresh=2)
print (df)
           df1         df2         df3
id                                    
1   [1.1, 2.2]  [5.1, 2.2]  [9.1, 3.2]
4          NaN  [2.1, 5.2]  [8.1, 3.2]

编辑:

如果id列解决方案中的唯一值不简化:

print (df3)
   id  val1  val2
0   1   9.1   3.2
1   4   8.1   3.2
2   1   1.3   4.5 <-change value to 1

dfs = [df1, df2, df3]
L = [x.groupby('id')['val1','val2'].apply(lambda x: x.values.ravel().tolist()) for x in dfs]
df = pd.concat(L, axis=1, keys=('df1','df2','df3'))
df = df[df.count(axis=1) > 1]
print (df)
           df1         df2                   df3
id                                              
1   [1.1, 2.2]  [5.1, 2.2]  [9.1, 3.2, 1.3, 4.5]
4          NaN  [2.1, 5.2]            [8.1, 3.2]

答案 1 :(得分:0)

df1['df1'] = list(df1[['val1', 'val2']].values)
df2['df2'] = list(df2[['val1', 'val2']].values)
df3['df3'] = list(df3[['val1', 'val2']].values)

df_result = pd.merge( pd.merge(df1[['id', 'df1']], df2[['id', 'df2']], on = 'id', how = 'outer'), df3[['id', 'df3']], on = 'id', how = 'outer')