Python:正则表达式迭代以匹配列表元素

时间:2017-04-13 14:40:18

标签: python regex iteration

设置

在伦敦搜索住房广告,我在每个广告中获得一个单元素列表中的地址,例如

address=['Brockham Drive, Brixton SW2']

我有一本字典链接伦敦自治市和他们的地区,例如

boroughs={ 
'Barking_Dagenham':['Barking', ..., 'Rush Green'],
'Barnet':['Arkley', ..., 'Woodside Park'],
    ⋮
'Westminster':['Bayswater', ..., 'Westminster'],
}

<小时/> 的问题

我想检查地区名称是否在address中。如果该区位于address,那么我想创建变量districtborough,表明该区及其相应的行政区。

<小时/> 代码尝试

(1)

for bor in boroughs.keys(): # loop over boroughs
   for distr in boroughs[bor]: # loop over borough's districts
      if distr in address[0]: # assign if district in address
         district = distr
         borough = bor
         break
      else:
         district = 'unknown'
         borough = 'unknown'

(1)不起作用。也就是说,所有内容都标记为'unknown'

我不确定我是否正确执行break,也不确定if distr in address[0]:是否是在迭代时检查匹配的正确方法。

(2)

for bor in boroughs.keys(): # loop over boroughs
   for distr in boroughs[bor]: # loop over borough's districts
      district = re.search(r'\b'distr'\b', address[0]):    
      borough = ?
      break
   else:
      district = 'unknown'
      borough = 'unknown'

使用(2),我不确定在使用'\ b'时如何正确迭代'bor'。当迭代产生正确的区域匹配时,不确定如何分配相应的区域。另外,不确定我是否应该使用(2)而不是(1)。

我应该使用哪种方法,以及如何让其中至少一种方法起作用?

1 个答案:

答案 0 :(得分:1)

Your code try #1 is correct, but missing one key element. You are only breaking out of the inner for loop, but then your code continues to loop through the outer for loop. Add a variable to check if it is found to break out of the outer for loop.

found = False

for bor in boroughs.keys(): # loop over boroughs
  for distr in boroughs[bor]: # loop over borough's districts
    if distr in address[0]: # assign if district in address
      district = distr
      borough = bor
      found = True
      break
    else:
      district = 'unknown'
      borough = 'unknown'
  if found:
    break