对于系列中的每个值,如果series1的值是series2中的子字符串,则从另一个熊猫系列中返回值

时间:2018-10-20 19:09:33

标签: python pandas dataframe

在下面的示例中,我试图创建一个新列parseFloat(profit)。我想要的是查找df1['new']的值,并查看它们是否是df1['city']中任何行的子字符串。如果是这样,我希望df2['des']具有df1['new']的值(在此示例中,是对城市的描述)。

df2['des']:

df1['city']

city 0 New York 1 Amsterdam 2 London 3 Karachi

df2['des']

这就是我想要的

    des
0   London is the capital and ...
1   Amsterdam and New York are two...
2   Karachi is the capital of...

此刻,我要解决的最接近的问题是:

        city                                  new
0   New York    Amsterdam and New York are two...
1  Amsterdam    Amsterdam and New York are two...
2     London        London is the capital and ...
3    Karachi         Karachi is the capital of...

哪个输出:

df['new'] = df.loc[df.des.str.contains("London"), 'des']

我想要的是,而不是仅在条件中传递 city new 0 New York NaN 1 Amsterdam NaN 2 London London is the capital and ... 3 Karachi NaN ,而是传递整个系列"London"。如果我这样做,则会收到此错误:df1['city']

2 个答案:

答案 0 :(得分:0)

假设匹配项重复,您只想一个匹配项。否则,任何解决方案都会更复杂。

遇到这些问题,与其遍历行,不如遍历城市并使用pd.Series.str.contains,通常会更好。例如,您可以创建一个字典:

d = {city: df2.loc[df2['des'].str.contains(city, regex=False), 'des'].iat[0] \
     for city in df1['city']}

然后通过pd.Series.map映射到df1

df1['des'] = df1['city'].map(d).fillna('No match found!')

答案 1 :(得分:0)

使用列表推导的另一种解决方案:

df1['new'] = [next((i for i in df2['des'] if x in i), 'Not found!') for x in df1['city']]

另一个使用正则表达式和str.extractall:

matches = df2['des'].str.extractall('({})'.format('|'.join(df1['city']))).reset_index(0)
m = matches.set_index(0)['level_0'].map(df2['des'])
df1['new'] = df1['city'].map(m).fillna('No match!')