在python中合并2列

时间:2018-04-06 14:40:02

标签: python pandas

我需要做的与我的功能相同:df_g['Bidfloor'] = df_g[['Sitio', 'Country']].merge(df_seg, how='left').Precio但在Country而不是完全相同的行只有前2个键,因为我无法改变数据的语言。所以我想只读取Country列的2个第一个键而不是Country列的所有键

df_g:

Sitio,Country
Los Andes Online,HN - Honduras
Guarda14,US - Estados Unidos
Guarda14,PE - Peru

df_seg:

Sitio,Country,Precio
Los Andes Online,HN - Honduras,0.5
Guarda14,US - United States,2.1

我需要什么:

Sitio,Country,Bidfloor
Los Andes Online,HN - Honduras,0.5
Guarda14,US - United States,2.1
Guarda14,PE - Peru,NULL

3 个答案:

答案 0 :(得分:0)

你需要额外的键来帮助合并,我使用cumcount来区分重复值

df1.assign(key=df1.groupby('Sitio').cumcount()).\
  merge(df2.assign(key=df2.groupby('Sitio').cumcount()).
   drop('Country',1),
    how='left',
     on=['Sitio','key'])
Out[1491]: 
              Sitio              Country  key  Precio
0  Los Andes Online        HN - Honduras    0     0.5
1          Guarda14  US - Estados Unidos    0     2.1
2          Guarda14            PE - Peru    1     NaN

答案 1 :(得分:0)

只需添加和删除合并列即可完成:

df_seg['merge_col'] = df_seg.Country.apply(lambda x: x.split('-')[0])

df_g['merge_col'] = df_g.Country.apply(lambda x: x.split('-')[0])

然后做:

df = pd.merge(df_g, df_seg[['merge_col', 'Precio']], on='merge_col', how='left').drop('merge_col', 1)

返回

Sitio   Country Precio
0   Los Andes Online    HN - Honduras   0.5
1   Guarda14    US - Estados Unidos 2.1
2   Guarda14    PE - Peru   NaN

答案 2 :(得分:0)

更简单,更清晰方式是使用索引。将data frame设置为相同的索引,然后分配Precio

df_g["Country"]   = df_g["Country"].apply(lambda k: k.split("-")[0])
df_seg["Country"] = df_seg["Country"].apply(lambda k: k.split("-")[0])

df_g   = df_g.set_index(["Country", "Sitio"])
df_seg = df_seg.set_index(["Country", "Sitio"])

df_g["Bidfloor"] = df_seg["Precio"]
df_g.reset_index()

Sitio                   Country                  Bidfloor
Los Andes Online        HN - Honduras            0.5
Guarda14                US - United States       2.1
Guarda14                PE - Peru                NaN