这是上一篇帖子Map partial string from dictionary in Pandas
的后续内容我稍微修改了映射字典
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,10,size=(5, 1)), columns=list('A'))
df.insert(0, 'n', ['abcde Germany fffe','aaaa Norway bbbb',
'tttt Sweden','Croatia dfdfdf','Italy sfsd'])
d = {'Germany':0.5, 'Croatia':1.5, 'Italy':1.5, 'Ital':1, 'German':0.9}
df['multiple'] = 1
for k, v in d.items():
df['multiple'] = np.where(df['n'].str.contains(k), v, df['multiple'])
print(df)
获得的输出:
n A multiple
0 abcde Germany fffe 3 0.9
1 aaaa Norway bbbb 7 1.0
2 tttt Sweden 5 1.0
3 Croatia dfdfdf 8 1.5
4 Italy sfsd 3 1.0
预期:
n A multiple
0 abcde Germany fffe 3 0.5
1 aaaa Norway bbbb 7 1.0
2 tttt Sweden 5 1.0
3 Croatia dfdfdf 8 1.5
4 Italy sfsd 3 1.5
关于如何获得预期输出的建议将非常有帮助。
答案 0 :(得分:1)
这是一种方法(类似于链接的帖子),该方法提取字典键中的单词,然后使用series.map
然后将fillna
和1
映射到没有匹配项的值:
pat = r'\b(?:{})\b'.format('|'.join(d.keys()))
df['multiple'] = df['n'].str.extract('('+pat+')',expand=False).map(d).fillna(1)
print(df)
n A multiple
0 abcde Germany fffe 5 0.5
1 aaaa Norway bbbb 4 1.0
2 tttt Sweden 1 1.0
3 Croatia dfdfdf 8 1.5
4 Italy sfsd 0 1.5