我有一个嵌套的字典,在将正则表达式与字典中的值匹配时遇到麻烦。我需要遍历字典中的值并返回正则表达式已匹配值的键。
我有这样的嵌套字典:
user_info = { 'user1': {'name': 'Aby',
'surname': 'Clark',
'description': 'Hi contact me by phone +1 548 5455 55
or facebook.com/aby.clark'},
'user2': {'name': 'Marta',
'surname': 'Bishop',
'description': 'Nice to meet you text me'},
'user3': {'name': 'Janice',
'surname': 'Valinise',
'description': 'You can contact me by phone +1 457
555667'},
'user4': {'name': 'Helen',
'surname': 'Bush',
'description': 'You can contact me by phone +1 778
65422'},
'user5': {'name': 'Janice',
'surname': 'Valinise',
'description': 'You can contact me by phone +1 457
5342327 or email janval@yahoo.com'}}
所以我需要用正则表达式遍历字典的值,找到一个匹配项,然后返回发生匹配项的键。
我遇到的第一个问题是从嵌套字典中提取值,但是我通过以下方法解决了这个问题:
for key in user_info.keys():
for values in user_info[key].values():
print(values)
然后从嵌套字典中获取一个值。因此,有一种方法可以使用regex遍历此值,因为它会找到一个匹配项并返回发生匹配项的键。
我尝试了以下操作:
for key in user_info.keys():
for values in user_info.[key].values():
#this regex match the email
email = re.compile(r'(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)'.format(pattern), re.IGNORECASE|re.MULTILINE)
match = re.match(email)
if match is not None:
print ("No values.")
if found:
return match
我做错什么了吗?我为这个问题努力了一周。 请告诉我出了什么问题,并提供提示,以解决此#!4fd ...问题。谢谢!
P.S。是的,我在stackoverflow和google上都没有发现类似的问题。我已经尝试过了。
答案 0 :(得分:0)
您可以尝试通过以下方式使用搜索代替匹配功能:
for key in user_info.keys():
for values in user_info[key].values():
email = re.search(r'([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)+', values)
if email != None:
print(key)
此代码将打印具有匹配内部值的所有键。
请注意,在您尝试过的代码中,您根本没有使用values
。
答案 1 :(得分:0)
好像您想从JSON值中提取电子邮件,同时还返回匹配的密钥。这是2个解决方案。第一个与您的相似,第二个被通用化为具有任意级别的任何JSON。
import re
user_info = {
"user1": {
"name": "Aby",
"surname": "Clark",
"description": "Hi contact me by phone +1 548 5455 55or facebook.com/aby.clark"
},
"user2": {
"name": "Marta",
"surname": "Bishop",
"description": "Nice to meet you text me"
},
"user3": {
"name": "Janice",
"surname": "Valinise",
"description": "You can contact me by phone +1 457 555667"
},
"user4": {
"name": "Helen",
"surname": "Bush",
"description": "You can contact me by phone +1 778 65422"
},
"user5": {
"name": "Janice",
"surname": "Valinise",
"description": "You can contact me by phone +1 457 5342327 or email janval@yahoo.com",
}
}
matches = []
for user, info in user_info.items():
for key, value in info.items():
emails = re.findall("([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", value)
if emails:
matches.append((f'{user}.{key}', emails))
print(matches)
# -> [('user5.description', ['janval@yahoo.com'])]
import re
user_info = {
"user1": {
"name": "Aby",
"surname": "Clark",
"description": "Hi contact me by phone +1 548 5455 55or janval@yahoo.com",
"friends": [
{
"name": "Aby",
"surname": "Clark",
"description": "Hi contact me by phone +1 548 5455 55or janval@yahoo.com",
}
]
}
}
def traverse(obj, keys = []):
if isinstance(obj, str):
emails = re.findall("([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", obj)
return [('.'.join(keys), emails)] if emails else []
if isinstance(obj, dict):
return [match for key, value in obj.items() for match in traverse(value, [*keys, key])]
if isinstance(obj, list):
return [match for i, value in enumerate(obj) for match in traverse(value, [*keys, str(i)])]
return []
print(traverse(user_info, []))
# -> [('user1.description', ['janval@yahoo.com']), ('user1.friends.0.description', ['janval@yahoo.com'])]