我尝试使用re
模块与python
一起创建脚本,以便长时间解析address
,phone
和email
中间有换行符的字符串。那里有两套容器。当我运行我的脚本时,它给了我第一个容器的结果,更不用说不需要的部分了。我不知道我在下面试过的方式是任何有效的尝试!任何帮助将受到高度赞赏。
我试过了:
import re
rstr = """
Address The Westshore Grand,
A Tribute Portfolio Hotel, Tampa
Telephone 52 70 90 00
E-mail info.suchona@gmail.com
Address hotels near 1255 north palm ave
sarasota florida
Telephone 62 40 80 00
E-mail info.niit@hotmail.com
"""
address = re.findall(r'(Address.+)',rstr)[0].strip()
phone = re.findall(r'(Telephone.+)',rstr)[0].strip()
email = re.findall(r'(E-mail.+)',rstr)[0].strip()
print(f'{address}\n{phone}\n{email}')
结果我有:
Address The Westshore Grand,
Telephone 52 70 90 00
E-mail info.suchona@gmail.com
我希望拥有的内容:
The Westshore Grand, A Tribute Portfolio Hotel, Tampa
52 70 90 00
info.suchona@gmail.com
hotels near 1255 north palm ave sarasota florida
62 40 80 00
info.niit@hotmail.com
虽然我知道可以通过字符串操作来实现,但我喜欢遵循regex
方式。感谢。
答案 0 :(得分:1)
Try this regex to get your address.
address = re.findall(r'(?<=Address).*?(?=Telephone)',rstr, flags=re.DOTALL)
Demo:
address = re.findall(r'(?<=Address).*?(?=Telephone)',rstr, flags=re.DOTALL)
phone = re.findall(r'(Telephone.+)',rstr)
email = re.findall(r'(E-mail.+)',rstr)
for i in zip(address, phone, email):
print('{address}\n{phone}\n{email}'.format(address=i[0].strip(), phone=i[1].strip(), email=i[2].strip()))
print( "-----" )
Output:
The Westshore Grand,
A Tribute Portfolio Hotel, Tampa
Telephone 52 70 90 00
E-mail info.suchona@gmail.com
-----
hotels near 1255 north palm ave
sarasota florida
Telephone 62 40 80 00
E-mail info.niit@hotmail.com
-----
答案 1 :(得分:0)
You need to make your RegEx capture group surround only what you want. And re.findall()
returns all occurrences of the matched RegEx pattern, so you could simply loop through them like so (assuming all three information are always there):
address = re.findall(r'Address(.+?)\n\n', rstr, flags=re.S)
phone = re.findall(r'Telephone(.+)', rstr)
email = re.findall(r'E-mail(.+)', rstr)
for i in range(len(address)):
print('\n'.join([
re.sub('\s{2,}', ' ', address[i].strip()),
phone[i].strip(),
email[i].strip()
]))
Output:
The Westshore Grand, A Tribute Portfolio Hotel, Tampa
52 70 90 00
info.suchona@gmail.com
hotels near 1255 north palm ave sarasota florida
62 40 80 00
info.niit@hotmail.com
答案 2 :(得分:0)
您想要匹配换行符:使用re.DOTALL
您还想抓住address
和telephone
之间的所有内容,但要非贪婪.+?
此外,您希望将其存储为一个组,因此请换入()
只用空格替换所有空格:re.sub
结果
addresses = [re.sub(r'\s+', r' ', addr)
for addr in re.findall(r'Address (.+?)Telephone', rstr, re.DOTALL)]
输出
['The Westshore Grand, A Tribute Portfolio Hotel, Tampa',
'hotels near 1255 north palm ave sarasota florida']
也做
phones = re.findall(r'Telephone\s*(.+)\s*', rstr)
emails = re.findall(r'E-mail\s*(.+)\s*', rstr)
然后你可以循环它们:
for addr, phone, email in zip(addresses, phones, emails):
print(addr, phone, email, sep='\n', end='\n\n')
<强>输出强>
The Westshore Grand, A Tribute Portfolio Hotel, Tampa
52 70 90 00
info.suchona@gmail.com
hotels near 1255 north palm ave sarasota florida
62 40 80 00
info.niit@hotmail.com