我试图根据与另一个数据框的比较将值插入一个数据框。 这是一个示例:
>>> import pandas as pd
>>> import numpy as np
>>> print(df)
>>> df
name
0 richard Finn, Tim Maltby
1 Fernando Lebrija
>>> df2
Fullname id
0 richard Finn 500
1 Tim Maltby 699
2 Fernando Lebrija 300
所需的输出是:
>>> df
name id
0 richard Finn, Tim Maltby 500,699
1 Fernando Lebrija 300
我尝试使用:
df['id'] = np.where((df['name']==df2['Fullname']), df2['id]', df['id'])
但是它给了我以下错误: `SyntaxError:语法无效
答案 0 :(得分:2)
您可以进行拆分,爆炸,然后映射和分组:
df['id'] = (df['name'].str.split(',\s*')
.explode()
.map(df2.set_index('Fullname')['id'])
.groupby(level=0).agg(list)
)
输出:
name id
0 richard Finn, Tim Maltby [500, 699]
1 Fernando Lebrija [300]
答案 1 :(得分:2)
另一种方法,使用列表理解
mapper = df2.set_index('Fullname')['id'].to_dict()
df['id'] = df['name'].apply(lambda x: ','.join([str(mapper.get(i.strip(), '')) for i in x.split(',')]))
name id
0 richard Finn, Tim Maltby 500,699
1 Fernando Lebrija 300
答案 2 :(得分:2)
我们还可以探索series.replace
:
s = dict(df2[['Fullname','id']].astype(str).to_numpy())
df1['id'] = df1['name'].replace(s,regex=True)
print(df1)
name id
0 richard Finn, Tim Maltby 500, 699
1 Fernando Lebrija 300
答案 3 :(得分:1)
我们可以使用str.split
stack
和merge
final = pd.merge(
df1["name"]
.str.split(",", expand=True)
.stack()
.str.strip()
.to_frame("Fullname")
.reset_index(level=0),
df2,
on="Fullname",
).astype(str).groupby("level_0").agg(",".join).rename_axis("", axis=0)
print(final)
Fullname id
0 richard Finn,Tim Maltby 500,699
1 Fernando Lebrija 300