我有一个数据集(df1),我想填充第二个数据集(df2)中的数据。两个数据帧中只有一列重叠,我将该列设置为df1和df2的索引所以我可以合并索引。
df = pd.read_excel('Data.xlsx', sheetname= 'Dataset1')
df2 = pd.read_excel('Data.xlsx', sheetname= 'Dataset2')
df1.set_index("ORG_ID", inplace=True)
df2.set_index("ORG_ID", inplace=True)
df3 = df1.merge(df2.ix[:,df2.columns-df1.columns], left_index=True, right_index=True, how="outer")
我希望输出的是一个新的数据集(df3),它列出了来自df1的所有数据,包括索引(ORG_IDs),并包括来自df2的所有新列,其中包含基于列出的ORG_ID的填充数据在df1。 python在这里做的是给我一个新的数据帧(df3),填入df1的数据,然后从df1的ORG_IDs下面的第二个数据集(df2)中添加所有Org_ids,这不是我想要的。
我也尝试过使用combine_first,但它似乎产生了类似的结果。
df3= df1.combine_first(df2)
Dataset1 (df1)
ORG_ID COUNTRY TOWN STORE PRODUCT PRICE
1 Spain Madrid Pink Garment 100
2 Greece Chania White Toy 200
3 U.K Manchester Red Garment 300
4 Italy Rome Red Accessory 500
5 Spain Marbella Blue Accessory 20
6 Greece Chania Green Garment 25
7 U.K Manchester Pink Toy 36
8 Italy Siena Red Accessory 150
9 Spain Barcelona White Toy 200
10 Greece Corfu Blue Accessory 500
数据集2(df2)
ORG_ID CUSTOMER TYPE PARENT REGION
5 A Pop Rose Europe
10 A Cry Tulip Europe
24 C Fig Lily Europe
89 G Pop Rose Europe
6 R Fig Lily Europe
4 Y Pop Rose Europe
1 T Fig Tulip Europe
7 H Pop Tulip Europe
8 S Fig Rose Europe
数据集3(df3) - 我想要的是什么
ORG_ID COUNTRY TOWN STORE PRODUCT PRICE CUSTOMER TYPE PARENT REGION
1 Spain Madrid Pink Garment 100 T Fig Tulip Europe
2 Greece Chania White Toy 200 NaN NaN NaN NaN
3 U.K Manchester Red Garment 300 NaN NaN NaN NaN
4 Italy Rome Red Accessory 500 Y Pop Rose Europe
5 Spain Marbella Blue Accessory 20 A Pop Rose Europe
6 Greece Chania Green Garment 25 R Fig Lily Europe
7 U.K Manchester Pink Toy 36 H Pop Tulip Europe
8 Italy Siena Red Accessory 150 S Fig Rose Europe
9 Spain Barcelona White Toy 200 NaN NaN NaN NaN
10 Greece Corfu Blue Accessory 500 A Cry Tulip Europe
答案 0 :(得分:2)
您的数据源中没有set_index
。您可以将merge
与on
参数和how='left'
一起使用。
df1 = pd.read_excel('Data.xlsx', sheetname= 'Dataset1')
df2 = pd.read_excel('Data.xlsx', sheetname= 'Dataset2')
df3 = df1.merge(df2, how='left', on='ORG_ID')
输出:
ORG_ID COUNTRY TOWN STORE PRODUCT PRICE CUSTOMER TYPE PARENT \
0 1 Spain Madrid Pink Garment 100 T Fig Tulip
1 2 Greece Chania White Toy 200 NaN NaN NaN
2 3 U.K Manchester Red Garment 300 NaN NaN NaN
3 4 Italy Rome Red Accessory 500 Y Pop Rose
4 5 Spain Marbella Blue Accessory 20 A Pop Rose
5 6 Greece Chania Green Garment 25 R Fig Lily
6 7 U.K Manchester Pink Toy 36 H Pop Tulip
7 8 Italy Siena Red Accessory 150 S Fig Rose
8 9 Spain Barcelona White Toy 200 NaN NaN NaN
9 10 Greece Corfu Blue Accessory 500 A Cry Tulip
REGION
0 Europe
1 NaN
2 NaN
3 Europe
4 Europe
5 Europe
6 Europe
7 Europe
8 NaN
9 Europe