从熊猫中的字典映射部分字符串(再次)

时间:2020-07-05 13:17:53

标签: python-3.x pandas dictionary mapping

这是上一篇帖子Map partial string from dictionary in Pandas

的后续内容

我稍微修改了映射字典

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,10,size=(5, 1)), columns=list('A'))
df.insert(0, 'n', ['abcde Germany fffe','aaaa Norway bbbb',
                   'tttt Sweden','Croatia dfdfdf','Italy sfsd'])

d = {'Germany':0.5, 'Croatia':1.5, 'Italy':1.5, 'Ital':1, 'German':0.9}

df['multiple'] = 1
for k, v in d.items():
    df['multiple'] = np.where(df['n'].str.contains(k), v, df['multiple'])

print(df)

获得的输出:

                    n  A  multiple
0  abcde Germany fffe  3       0.9
1    aaaa Norway bbbb  7       1.0
2         tttt Sweden  5       1.0
3      Croatia dfdfdf  8       1.5
4          Italy sfsd  3       1.0

预期:

                    n  A  multiple
0  abcde Germany fffe  3       0.5
1    aaaa Norway bbbb  7       1.0
2         tttt Sweden  5       1.0
3      Croatia dfdfdf  8       1.5
4          Italy sfsd  3       1.5

关于如何获得预期输出的建议将非常有帮助。

1 个答案:

答案 0 :(得分:1)

这是一种方法(类似于链接的帖子),该方法提取字典键中的单词,然后使用series.map然后将fillna1映射到没有匹配项的值:

pat = r'\b(?:{})\b'.format('|'.join(d.keys()))
df['multiple'] = df['n'].str.extract('('+pat+')',expand=False).map(d).fillna(1)

print(df)
                    n  A  multiple
0  abcde Germany fffe  5       0.5
1    aaaa Norway bbbb  4       1.0
2         tttt Sweden  1       1.0
3      Croatia dfdfdf  8       1.5
4          Italy sfsd  0       1.5