如何从python字典中获取最匹配的值

时间:2018-04-11 22:53:04

标签: python string-matching

我有一份工作类别关键字词典,如下所示:

{'01': ['advertising, representatives, financial, miscellaneous, other, sales'],
 '02': ['musicians, workers, officials, entertainers, actors, singers, competitors, dancers'],
 '03': ['movers, station, gas, of, stock, pumping, workers, hoist, mining, freight, truck'],
 '04': ['child, support, children, disable, supplemental, security, income']}

我还有一份职位名单:

child support
art director
driver
assistant specialist

我想知道是否有办法为每个职位找到最匹配的职位类别。所以我想要的是在作业标题列表中为每个字典中的一个键的新列。

job title             category     
child support           04
art director            23
truck driver            03
assistant specialist    17

关键是代表广泛的工作类别。 此字典中的值是更详细的子类别名称的拆分唯一字。 问题是职称中的单词可能不存在于任何类别名称中。 这些词可能存在于许多类别中。

1 个答案:

答案 0 :(得分:0)

使用以下解决方案:

def match_job_title(category_dict, job_title):                                                                           
  job_title=set(job_title.split(' ')) # make a set with job title words                                                                                   
  best_match=0 # the best number of  matches                                                                                                          
  best_category=None # the best category                                                                                                    

  for category, keywords in category_dict.items(): # iterate through category dict                                                                      
    keywords=set(keywords[0].split(sep=', ')) # make set of keywords                                                                            
    matches=len(keywords.intersection(job_title)) # intersect and get number of elements                                                                        
    if matches>best_match: # if it's better use this category                                                                                             
      best_match=matches                                                                                                 
      best_category=category                                                                                             

  return best_category

通过以下方式使用:

>>> d={ '01': ['advertising, representatives, financial, miscellaneous, other, sales'],                                      
        '02': ['musicians, workers, officials, entertainers, actors, singers, competitors, dancers'],                          
        '03': ['movers, station, gas, of, stock, pumping, workers, hoist, mining, freight, truck'],                            
        '04': ['child, support, children, disable, supplemental, security, income']}

>>> match_job_title(d, 'child support')
04
>>> match_job_title(d, 'financial femme fatal')
01
>>> match_job_title(d, 'circus clown')==None
True
>>> for job in ['child support', 'art director', 'truck driver']:
...   print(job, match_job_title(d, job))
child support 04
art director None
truck driver 03

更加pythonic字典看起来像这样(在上面的代码中需要进行少量修改):

  • 键似乎是int,因此请将它们设为int
  • 值应为关键字sets

示例:

{1: {'advertising', 'representatives', 'financial', 'miscellaneous', 'other', 'sales'},
 2: {'musicians', 'workers', 'officials', 'entertainers', 'actors', 'singers', 'competitors', 'dancers'},
 3: {'movers', 'station', 'gas', 'of', 'stock', 'pumping', 'workers', 'hoist', 'mining', 'freight', 'truck'},
 4: {'child', 'support', 'children', 'disable', 'supplemental', 'security', 'income'}}