我需要做的与我的功能相同:df_g['Bidfloor'] = df_g[['Sitio', 'Country']].merge(df_seg, how='left').Precio
但在Country
而不是完全相同的行只有前2个键,因为我无法改变数据的语言。所以我想只读取Country
列的2个第一个键而不是Country
列的所有键
df_g:
Sitio,Country
Los Andes Online,HN - Honduras
Guarda14,US - Estados Unidos
Guarda14,PE - Peru
df_seg:
Sitio,Country,Precio
Los Andes Online,HN - Honduras,0.5
Guarda14,US - United States,2.1
我需要什么:
Sitio,Country,Bidfloor
Los Andes Online,HN - Honduras,0.5
Guarda14,US - United States,2.1
Guarda14,PE - Peru,NULL
答案 0 :(得分:0)
你需要额外的键来帮助合并,我使用cumcount
来区分重复值
df1.assign(key=df1.groupby('Sitio').cumcount()).\
merge(df2.assign(key=df2.groupby('Sitio').cumcount()).
drop('Country',1),
how='left',
on=['Sitio','key'])
Out[1491]:
Sitio Country key Precio
0 Los Andes Online HN - Honduras 0 0.5
1 Guarda14 US - Estados Unidos 0 2.1
2 Guarda14 PE - Peru 1 NaN
答案 1 :(得分:0)
只需添加和删除合并列即可完成:
df_seg['merge_col'] = df_seg.Country.apply(lambda x: x.split('-')[0])
df_g['merge_col'] = df_g.Country.apply(lambda x: x.split('-')[0])
然后做:
df = pd.merge(df_g, df_seg[['merge_col', 'Precio']], on='merge_col', how='left').drop('merge_col', 1)
返回
Sitio Country Precio
0 Los Andes Online HN - Honduras 0.5
1 Guarda14 US - Estados Unidos 2.1
2 Guarda14 PE - Peru NaN
答案 2 :(得分:0)
更简单,更清晰方式是使用索引。将data frame
设置为相同的索引,然后分配Precio
:
df_g["Country"] = df_g["Country"].apply(lambda k: k.split("-")[0])
df_seg["Country"] = df_seg["Country"].apply(lambda k: k.split("-")[0])
df_g = df_g.set_index(["Country", "Sitio"])
df_seg = df_seg.set_index(["Country", "Sitio"])
df_g["Bidfloor"] = df_seg["Precio"]
df_g.reset_index()
Sitio Country Bidfloor
Los Andes Online HN - Honduras 0.5
Guarda14 US - United States 2.1
Guarda14 PE - Peru NaN