Question

我需要做的与我的功能相同：df_g['Bidfloor'] = df_g[['Sitio', 'Country']].merge(df_seg, how='left').Precio但在Country而不是完全相同的行只有前2个键，因为我无法改变数据的语言。所以我想只读取Country列的2个第一个键而不是Country列的所有键

df_g：

Sitio,Country
Los Andes Online,HN - Honduras
Guarda14,US - Estados Unidos
Guarda14,PE - Peru

df_seg：

Sitio,Country,Precio
Los Andes Online,HN - Honduras,0.5
Guarda14,US - United States,2.1

我需要什么：

Sitio,Country,Bidfloor
Los Andes Online,HN - Honduras,0.5
Guarda14,US - United States,2.1
Guarda14,PE - Peru,NULL

Answer 1

你需要额外的键来帮助合并，我使用cumcount来区分重复值

df1.assign(key=df1.groupby('Sitio').cumcount()).\
  merge(df2.assign(key=df2.groupby('Sitio').cumcount()).
   drop('Country',1),
    how='left',
     on=['Sitio','key'])
Out[1491]: 
              Sitio              Country  key  Precio
0  Los Andes Online        HN - Honduras    0     0.5
1          Guarda14  US - Estados Unidos    0     2.1
2          Guarda14            PE - Peru    1     NaN

Answer 2

只需添加和删除合并列即可完成：

df_seg['merge_col'] = df_seg.Country.apply(lambda x: x.split('-')[0])

df_g['merge_col'] = df_g.Country.apply(lambda x: x.split('-')[0])

然后做：

df = pd.merge(df_g, df_seg[['merge_col', 'Precio']], on='merge_col', how='left').drop('merge_col', 1)

返回

Sitio   Country Precio
0   Los Andes Online    HN - Honduras   0.5
1   Guarda14    US - Estados Unidos 2.1
2   Guarda14    PE - Peru   NaN

Answer 3

更简单，更清晰方式是使用索引。将data frame设置为相同的索引，然后分配Precio：

df_g["Country"]   = df_g["Country"].apply(lambda k: k.split("-")[0])
df_seg["Country"] = df_seg["Country"].apply(lambda k: k.split("-")[0])

df_g   = df_g.set_index(["Country", "Sitio"])
df_seg = df_seg.set_index(["Country", "Sitio"])

df_g["Bidfloor"] = df_seg["Precio"]
df_g.reset_index()

Sitio                   Country                  Bidfloor
Los Andes Online        HN - Honduras            0.5
Guarda14                US - United States       2.1
Guarda14                PE - Peru                NaN

在python中合并2列

3 个答案: