通过匹配字典列表中的子字符串来映射熊猫列

时间:2020-10-02 14:31:50

标签: python-3.x pandas mapping

我有一个包含名称的数据框和一个包含名称和每个名称计数的字典列表。我需要根据results中每个名称的存在来创建一个新列。但是问题不是完全匹配,而是仅基于名字的一部分。到目前为止,我尝试过的所有解决方案都非常笨拙,因此我希望这个美好的社区可能会提出一些更优雅的建议。

dic = {"IDs": ['a','b','c','d','e','f','g','k','l','m'],
       "names": ['Ailbhe Yowa',
 'Hannah Kirst',
 'Morris Hunt',
 'Flavia Quor in the UK',
 'Sarah Smith and Alexandra Libman',
 'Flavia Morris, Mark Torre, Ann Moor',
 'Rowena Freez',
 'Adam Lion in USA',
 'Mahmood Jade  in Europe',
 'Morris Tool and  Francois Lopin']
    
}
test = pd.DataFrame(dic)

results = [[{'name': 'Ailbhe', 'count': 17}],
 [{'name': 'Mahmood', 'count': 2818}],
 [{'name': 'Debbie', 'count': 11493}],
 [{'name': 'Arthur', 'count': 20587}],
 [{'name': 'Clive', 'count': 2703}],
 [{'name': 'Flavia', 'count': 10166}],
 [{'name': 'Alexandra', 'count': 1939}],
 [{'name': 'Sarah', 'count': 88388}],
 [{'name': 'Morris', 'count': 3194}],
 [{'name': 'Cameron', 'count': 3334}]]

所需的输出应如下所示:

    IDs names                               results
0   a   Ailbhe Yowa                         [{'name': 'Ailbhe', 'count': 17}]
1   b   Hannah Kirst    
2   c   Morris Hunt                         [{'name': 'Morris', 'count': 3194}]
3   d   Flavia Quor in the UK               [{'name': 'Flavia', 'count': 10166}]
4   e   Sarah Smith and Alexandra Libman    [{'name': 'Sarah', 'count': 88388}, {'name': 'Alexandra', 'count': 1939}]
5   f   Flavia Morris, Mark Torre, Ann Moor [{'name': 'Flavia', 'count': 10166}]
6   g   Rowena Freez    
7   k   Adam Lion in USA    
8   l   Mahmood Jade in Europe              [{'name': 'Mahmood', 'count': 2818}]
9   m   Morris Tool and Francois Lopin      [{'name': 'Morris', 'count': 3194}]

2 个答案:

答案 0 :(得分:1)

使用[emerg] 1#1: unknown directive "server1.com" in /etc/nginx/conf.d/nginx.conf:2 的一种方式:

pandas.Series.str.findall

输出:

name_dict = {l[0]["name"]: l[0] for l in results}
reg = "(%s)" % "|".join(list(name_dict))
test["results"] = test["names"].str.findall(reg).apply(lambda x: [name_dict[i] for i in x])
print(test)

答案 1 :(得分:0)

resultsToAdd = []
for index, row in test.iterrows():
    for result_row in results:
        isFound = False
        for result in result_row:
                if result['name'] in row['names']:
                    isFound = True
                    break
        if isFound:
            break
    if isFound:
        resultsToAdd.append(result_row)
    else :
        resultsToAdd.append(" ")
test["results"] = resultsToAdd        
print(test)