Python Pandas - 检查子字符串包含并将新列设置为子字符串

时间:2017-05-05 10:41:59

标签: python string pandas

我需要检查字符串包含并将新列设置为子字符串值。我目前正在尝试这个

df['NEW_COL'] = df['COL_TO_CHECK'].str.contains('|'.join(substring_list))

而不是为包含返回布尔值true false ...我需要返回匹配的substring_list的实际值以填充df['NEW_COL]

检查

的子网站
substring_list = ['apple', 'banana', 'cherry']

结果数据框

OLD_COL              NEW_COL
apple pie            apple
black cherry         cherry
banana lemon drop    banana

2 个答案:

答案 0 :(得分:2)

您对数据的内容和想要的内容并不十分了解,但一般原则是您可以使用:

df['NEW_COL'] = df['COL_TO_CHECK'].apply(lambda x: do_something(x) if is_something(x) else x)

或者在你的例子中:

substring_list = set(['apple', 'banana', 'cherry'])
df['NEW_COL'] = df['OLD_COL'].apply(lambda x: set(x.split()).intersection(substring_list).pop())

set更快:)

答案 1 :(得分:2)

我这样做:

In [148]: df
Out[148]:
             OLD_COL
0          apple pie
1       black cherry
2  banana lemon drop

In [149]: pat = '.*({}).*'.format('|'.join(substring_list))

In [150]: pat
Out[150]: '.*(apple|banana|cherry).*'

In [151]: df['NEW_COL'] = df['OLD_COL'].str.replace(pat, r'\1')

In [152]: df
Out[152]:
             OLD_COL NEW_COL
0          apple pie   apple
1       black cherry  cherry
2  banana lemon drop  banana