我需要检查字符串包含并将新列设置为子字符串值。我目前正在尝试这个
df['NEW_COL'] = df['COL_TO_CHECK'].str.contains('|'.join(substring_list))
而不是为包含返回布尔值true false ...我需要返回匹配的substring_list
的实际值以填充df['NEW_COL]
substring_list = ['apple', 'banana', 'cherry']
OLD_COL NEW_COL
apple pie apple
black cherry cherry
banana lemon drop banana
答案 0 :(得分:2)
您对数据的内容和想要的内容并不十分了解,但一般原则是您可以使用:
df['NEW_COL'] = df['COL_TO_CHECK'].apply(lambda x: do_something(x) if is_something(x) else x)
或者在你的例子中:
substring_list = set(['apple', 'banana', 'cherry'])
df['NEW_COL'] = df['OLD_COL'].apply(lambda x: set(x.split()).intersection(substring_list).pop())
set
更快:)
答案 1 :(得分:2)
我这样做:
In [148]: df
Out[148]:
OLD_COL
0 apple pie
1 black cherry
2 banana lemon drop
In [149]: pat = '.*({}).*'.format('|'.join(substring_list))
In [150]: pat
Out[150]: '.*(apple|banana|cherry).*'
In [151]: df['NEW_COL'] = df['OLD_COL'].str.replace(pat, r'\1')
In [152]: df
Out[152]:
OLD_COL NEW_COL
0 apple pie apple
1 black cherry cherry
2 banana lemon drop banana