在下面的示例中,我试图创建一个新列parseFloat(profit)
。我想要的是查找df1['new']
的值,并查看它们是否是df1['city']
中任何行的子字符串。如果是这样,我希望df2['des']
具有df1['new']
的值(在此示例中,是对城市的描述)。
df2['des']:
:
df1['city']
city
0 New York
1 Amsterdam
2 London
3 Karachi
:
df2['des']
这就是我想要的
des
0 London is the capital and ...
1 Amsterdam and New York are two...
2 Karachi is the capital of...
此刻,我要解决的最接近的问题是:
city new
0 New York Amsterdam and New York are two...
1 Amsterdam Amsterdam and New York are two...
2 London London is the capital and ...
3 Karachi Karachi is the capital of...
哪个输出:
df['new'] = df.loc[df.des.str.contains("London"), 'des']
我想要的是,而不是仅在条件中传递 city new
0 New York NaN
1 Amsterdam NaN
2 London London is the capital and ...
3 Karachi NaN
,而是传递整个系列"London"
。如果我这样做,则会收到此错误:df1['city']
答案 0 :(得分:0)
假设匹配项重复,您只想一个匹配项。否则,任何解决方案都会更复杂。
遇到这些问题,与其遍历行,不如遍历城市并使用pd.Series.str.contains
,通常会更好。例如,您可以创建一个字典:
d = {city: df2.loc[df2['des'].str.contains(city, regex=False), 'des'].iat[0] \
for city in df1['city']}
然后通过pd.Series.map
映射到df1
:
df1['des'] = df1['city'].map(d).fillna('No match found!')
答案 1 :(得分:0)
使用列表推导的另一种解决方案:
df1['new'] = [next((i for i in df2['des'] if x in i), 'Not found!') for x in df1['city']]
另一个使用正则表达式和str.extractall:
matches = df2['des'].str.extractall('({})'.format('|'.join(df1['city']))).reset_index(0)
m = matches.set_index(0)['level_0'].map(df2['des'])
df1['new'] = df1['city'].map(m).fillna('No match!')