在python中匹配正则表达式并返回键

时间:2019-08-19 21:08:08

标签: python regex python-3.x dictionary

我有一个嵌套的字典,在将正则表达式与字典中的值匹配时遇到麻烦。我需要遍历字典中的值并返回正则表达式已匹配值的键。

我有这样的嵌套字典:

    user_info = { 'user1': {'name': 'Aby',
                    'surname': 'Clark',
                    'description': 'Hi contact me by phone +1 548 5455 55 
                     or facebook.com/aby.clark'},
          'user2': {'name': 'Marta',
                     'surname': 'Bishop',
                     'description': 'Nice to meet you text me'},
           'user3': {'name': 'Janice',
                     'surname': 'Valinise',
                     'description': 'You can contact me by phone +1 457 
                      555667'},
           'user4': {'name': 'Helen',
                     'surname': 'Bush',
                     'description': 'You can contact me by phone +1 778 
                      65422'},
           'user5': {'name': 'Janice',
                     'surname': 'Valinise',
                     'description': 'You can contact me by phone +1 457 
                      5342327 or email janval@yahoo.com'}}

所以我需要用正则表达式遍历字典的值,找到一个匹配项,然后返回发生匹配项的键。

我遇到的第一个问题是从嵌套字典中提取值,但是我通过以下方法解决了这个问题:

   for key in user_info.keys():
       for values in user_info[key].values():
           print(values)

然后从嵌套字典中获取一个值。因此,有一种方法可以使用regex遍历此值,因为它会找到一个匹配项并返回发生匹配项的键。

我尝试了以下操作:

 for key in user_info.keys():
     for values in user_info.[key].values():

         #this regex match the email
         email = re.compile(r'(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)'.format(pattern), re.IGNORECASE|re.MULTILINE) 
         match = re.match(email)

         if match is not None:
             print ("No values.")

      if found: 
         return match

我做错什么了吗?我为这个问题努力了一周。 请告诉我出了什么问题,并提供提示,以解决此#!4fd ...问题。谢谢!

P.S。是的,我在stackoverflow和google上都没有发现类似的问题。我已经尝试过了。

2 个答案:

答案 0 :(得分:0)

您可以尝试通过以下方式使用搜索代替匹配功能:

for key in user_info.keys():
    for values in user_info[key].values():
        email = re.search(r'([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)+', values)
        if email != None:
            print(key)

此代码将打印具有匹配内部值的所有键。

请注意,在您尝试过的代码中,您根本没有使用values

答案 1 :(得分:0)

好像您想从JSON值中提取电子邮件,同时还返回匹配的密钥。这是2个解决方案。第一个与您的相似,第二个被通用化为具有任意级别的任何JSON。

  1. 两个for循环
import re

user_info = {
  "user1": {
    "name": "Aby",
    "surname": "Clark",
    "description": "Hi contact me by phone +1 548 5455 55or facebook.com/aby.clark"
  },
  "user2": {
    "name": "Marta",
    "surname": "Bishop",
    "description": "Nice to meet you text me"
  },
  "user3": {
    "name": "Janice",
    "surname": "Valinise",
    "description": "You can contact me by phone +1 457 555667"
  },
  "user4": {
    "name": "Helen",
    "surname": "Bush",
    "description": "You can contact me by phone +1 778 65422"
  },
  "user5": {
    "name": "Janice",
    "surname": "Valinise",
    "description": "You can contact me by phone +1 457 5342327 or email janval@yahoo.com",
  }
}

matches = []
for user, info in user_info.items():
    for key, value in info.items():
        emails = re.findall("([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", value)
        if emails:
          matches.append((f'{user}.{key}', emails))

print(matches)
# -> [('user5.description', ['janval@yahoo.com'])]

  1. 任意JSON的递归方法
import re

user_info = {
  "user1": {
    "name": "Aby",
    "surname": "Clark",
    "description": "Hi contact me by phone +1 548 5455 55or janval@yahoo.com",
    "friends": [
      {
        "name": "Aby",
        "surname": "Clark",
        "description": "Hi contact me by phone +1 548 5455 55or janval@yahoo.com",
      }
    ]
  }
}

def traverse(obj, keys = []):
  if isinstance(obj, str):
    emails = re.findall("([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", obj)
    return [('.'.join(keys), emails)] if emails else []
  if isinstance(obj, dict):
    return [match for key, value in obj.items() for match in traverse(value, [*keys, key])]
  if isinstance(obj, list):
    return [match for i, value in enumerate(obj) for match in traverse(value, [*keys, str(i)])] 
  return []

print(traverse(user_info, []))
# -> [('user1.description', ['janval@yahoo.com']), ('user1.friends.0.description', ['janval@yahoo.com'])]