出于这个问题的目的,我生成了以下两个生成的DataFrame:
df1 = pd.DataFrame({"model": [f"model{i//2}" for i in range(6)], "label": [f"label_{i}" for i in range(6)], "data": [f"data_{i}" for i in range(6)]})
df1 = df1.set_index("model")
df2 = pd.DataFrame({"model": [f"model{i}" for i in range(3)], "info": [f"info_{i}" for i in range(3)], "stuff": [f"stuff_{i}" for i in range(3)]})
df2 = df2.set_index("model")
df1
看起来像这样:
[model] label data
model0 label_0 data_0
model0 label_1 data_1
model1 label_2 data_2
model1 label_3 data_3
model2 label_4 data_4
model2 label_5 data_5
和df2
如下:
[model] info stuff
model0 info_0 stuff_0
model1 info_1 stuff_1
model2 info_2 stuff_2
[...]
表示数据帧的索引。我希望以某种方式将这两个DataFrame都加入以输出以下内容;
[model] info stuff label data
model0 info_0 stuff_0 label_0 data_0
model0 NAN NAN label_1 data_1
model1 info_1 stuff_1 label_2 data_2
model1 NAN NAN label_3 data_3
model2 info_2 stuff_2 label_4 data_4
model2 NAN NAN label_5 data_5
我似乎找不到有关上述操作方法的任何文档。我曾尝试使用join
,concat
和merge
进行多种代码组合,但以上均未得到结果。我知道我可以编写一个函数来执行此操作,但是我希望可以通过Pandas原生join
,concat
,merge
函数来实现此功能。
如果对pandas
有更多经验的人可以引导我朝正确的方向前进,我将不胜感激!
答案 0 :(得分:2)
首先,我们重置索引,以便我们可以合并model
列上的两个数据帧。然后,您可以使用duplicated
中的pd.Series
方法来掩盖重复项,然后用NaN
填充重复项:
df1 = df1.reset_index(drop=False)
df2 = df2.reset_index(drop=False)
df_new = pd.merge(df1,df2, how='outer')
df_new = df_new.set_index('model')
is_duplicated = df_new.apply(pd.Series.duplicated, axis=0)
df_new = df_new.where(~is_duplicated, np.nan)
新数据帧df_new
是所需的结果。
答案 1 :(得分:2)
这是另一种方法:
import pandas as pd
df1 = pd.DataFrame({"model": [f"model{i//2}" for i in range(6)], "label": [f"label_{i}" for i in range(6)], "data": [f"data_{i}" for i in range(6)]})
df1 = df1.set_index("model")
df2 = pd.DataFrame({"model": [f"model{i}" for i in range(3)], "info": [f"info_{i}" for i in range(3)], "stuff": [f"stuff_{i}" for i in range(3)]})
df2 = df2.set_index("model")
df1_g = df1.groupby(by='model').first()
print(pd.concat([df1_g, df2], axis=1).append( df1[~df1.isin(df1_g)].dropna(), sort=False ).sort_index() )
打印:
label data info stuff
model
model0 label_0 data_0 info_0 stuff_0
model0 label_1 data_1 NaN NaN
model1 label_2 data_2 info_1 stuff_1
model1 label_3 data_3 NaN NaN
model2 label_4 data_4 info_2 stuff_2
model2 label_5 data_5 NaN NaN