我有一份工作类别关键字词典,如下所示:
{'01': ['advertising, representatives, financial, miscellaneous, other, sales'],
'02': ['musicians, workers, officials, entertainers, actors, singers, competitors, dancers'],
'03': ['movers, station, gas, of, stock, pumping, workers, hoist, mining, freight, truck'],
'04': ['child, support, children, disable, supplemental, security, income']}
我还有一份职位名单:
child support
art director
driver
assistant specialist
我想知道是否有办法为每个职位找到最匹配的职位类别。所以我想要的是在作业标题列表中为每个字典中的一个键的新列。
job title category
child support 04
art director 23
truck driver 03
assistant specialist 17
关键是代表广泛的工作类别。 此字典中的值是更详细的子类别名称的拆分唯一字。 问题是职称中的单词可能不存在于任何类别名称中。 这些词可能存在于许多类别中。
答案 0 :(得分:0)
使用以下解决方案:
def match_job_title(category_dict, job_title):
job_title=set(job_title.split(' ')) # make a set with job title words
best_match=0 # the best number of matches
best_category=None # the best category
for category, keywords in category_dict.items(): # iterate through category dict
keywords=set(keywords[0].split(sep=', ')) # make set of keywords
matches=len(keywords.intersection(job_title)) # intersect and get number of elements
if matches>best_match: # if it's better use this category
best_match=matches
best_category=category
return best_category
通过以下方式使用:
>>> d={ '01': ['advertising, representatives, financial, miscellaneous, other, sales'],
'02': ['musicians, workers, officials, entertainers, actors, singers, competitors, dancers'],
'03': ['movers, station, gas, of, stock, pumping, workers, hoist, mining, freight, truck'],
'04': ['child, support, children, disable, supplemental, security, income']}
>>> match_job_title(d, 'child support')
04
>>> match_job_title(d, 'financial femme fatal')
01
>>> match_job_title(d, 'circus clown')==None
True
>>> for job in ['child support', 'art director', 'truck driver']:
... print(job, match_job_title(d, job))
child support 04
art director None
truck driver 03
更加pythonic字典看起来像这样(在上面的代码中需要进行少量修改):
int
,因此请将它们设为int
示例:
{1: {'advertising', 'representatives', 'financial', 'miscellaneous', 'other', 'sales'},
2: {'musicians', 'workers', 'officials', 'entertainers', 'actors', 'singers', 'competitors', 'dancers'},
3: {'movers', 'station', 'gas', 'of', 'stock', 'pumping', 'workers', 'hoist', 'mining', 'freight', 'truck'},
4: {'child', 'support', 'children', 'disable', 'supplemental', 'security', 'income'}}