我要提取电话,传真,移动,如果不是,我从字符串中获取它可以返回空字符串。我想从任何给定的文本字符串字符串示例中给出3个电话,传真,移动列表。
ex1 = "miramar road margie shoop san diego ca 12793 manager phone 6035550160 fax 6035550161 mobile 6035550178 marsgies travel wwwmarpiestravelcom"
ex2 = "david packard electrical engineering 350 serra mall room 170 phone 650 7259327 stanford university fax 650 723 1882 stanford california 943059505 ulateecestanfordedu"
ex3 = "stanford electrical engineering vijay chandrasekhar electrical engineering 17 comstock circle apt 101 stanford ca 94305 phone 9162210411"
像这样的正则表达式是可能的:
phone_regex = re.match(".*phone(.*)fax(.*)mobile(.*)",ex1)
phone = [re.sub("[^0-9]","",x) for x in phone_regex.groups()][0]
mobile = [re.sub("[^0-9]","",x) for x in phone_regex.groups()][2]
fax = [re.sub("[^0-9]","",x) for x in phone_regex.groups()][1]
ex1
的结果:
电话= 6035550160
传真= 6035550161
手机= 6035550178
ex2
没有移动条目,所以我得到:
回溯(最近通话最近):
phone = [re.sub(“ [^ 0-9]”,“”,x)for phone_regex.groups()中的x [0]
AttributeError:“ NoneType”对象没有属性“组”
问题
我需要一个更好的正则表达式解决方案,因为我是正则表达式的新手,
或者,作为解决方案,以捕获 AttributeError 并分配null string
。
答案 0 :(得分:2)
您可以像这样使用简单的re.findall
:
dict(re.findall(r'\b({})\s*(\d+)'.format("|".join(keys)), ex))
正则表达式看起来像
\b(phone|fax|mobile)\s*(\d+)
模式详细信息
\b
-单词边界(phone|fax|mobile)
-第1组:列出的单词之一\s*
-超过0个空格(\d+)
-第2组:一个或多个数字请参见Python demo:
import re
exs = ["miramar road margie shoop san diego ca 12793 manager phone 6035550160 fax 6035550161 mobile 6035550178 marsgies travel wwwmarpiestravelcom",
"david packard electrical engineering 350 serra mall room 170 phone 650 7259327 stanford university fax 650 723 1882 stanford california 943059505 ulateecestanfordedu",
"stanford electrical engineering vijay chandrasekhar electrical engineering 17 comstock circle apt 101 stanford ca 94305 phone 9162210411"]
keys = ['phone', 'fax', 'mobile']
for ex in exs:
res = dict(re.findall(r'\b({})\s*(\d+)'.format("|".join(keys)), ex))
print(res)
输出:
{'fax': '6035550161', 'phone': '6035550160', 'mobile': '6035550178'}
{'fax': '650', 'phone': '650'}
{'phone': '9162210411'}
答案 1 :(得分:1)
我想我了解您想要的..这与准确获取关键字后的第一个匹配项有关。在这种情况下,您需要的是问号?:
”“?”也是量词。{0,1}的缩写。表示“匹配零或该问号之前的组之一。”也可以解释为问号之前的部分是可选的”
如果定义不够,这是一些应该起作用的代码
import re
res_dict = {}
list_keywords = ['phone', 'cell', 'fax']
for i_key in list_keywords:
temp_res = re.findall(i_key + '(.*?) [a-zA-Z]', ex1)
res_dict[i_key] = temp_res
答案 2 :(得分:1)
使用re.search
演示:
import re
ex1 = "miramar road margie shoop san diego ca 12793 manager phone 6035550160 fax 6035550161 mobile 6035550178 marsgies travel wwwmarpiestravelcom"
ex2 = "david packard electrical engineering 350 serra mall room 170 phone 650 7259327 stanford university fax 650 723 1882 stanford california 943059505 ulateecestanfordedu"
ex3 = "stanford electrical engineering vijay chandrasekhar electrical engineering 17 comstock circle apt 101 stanford ca 94305 phone 9162210411"
for i in [ex1, ex2, ex3]:
phone = re.search(r"(?P<phone>(?<=\phone\b).*?(?=([a-z]|$)))", i)
if phone:
print "Phone: ", phone.group("phone")
fax = re.search(r"(?P<fax>(?<=\bfax\b).*?(?=([a-z]|$)))", i)
if fax:
print "Fax: ", fax.group("fax")
mob = re.search(r"(?P<mob>(?<=\bmobile\b).*?(?=([a-z]|$)))", i)
if mob:
print "mob: ", mob.group("mob")
print("-----")
输出:
Phone: 6035550160
Fax: 6035550161
mob: 6035550178
-----
Phone: 650 7259327
Fax: 650 723 1882
-----
Phone: 9162210411
-----
答案 3 :(得分:1)
我认为以下正则表达式应该可以正常工作:
mobile = re.findall('mobile([0-9]*)', ex1.replace(" ",""))[0]
fax = re.findall('fax([0-9]*)', ex1.replace(" ",""))[0]
phone = re.findall('phone([0-9]*)', ex1.replace(" ",""))[0]