Question

在python中拆分它的最佳方法是什么。（地址，城市，州，邮编）

<div class="adtxt">7616 W Belmont Ave<br />Chicago, IL 60634-3225</div>

在某些情况下，邮政编码为

 <div class="adtxt">7616 W Belmont Ave<br />Chicago, IL 60634</div>

Answer 1

根据您希望在单个示例中无法推断的各个方面的紧张程度或宽松程度，以下内容应该有效......：

import re

s = re.compile(r'^<div.*?>([^<]+)<br.*?>([^,]+), (\w\w) (\d{5}-\d{4})</div>$')
mo = s.match(thestring)
if mo is None:
  raise ValueError('No match for %r' % thestring)
address, city, state, zip = mo.groups()

Answer 2

只是一个提示：解析HTML比正则表达式有更好的方法，例如Beautiful Soup。

Here's why you shouldn't do that with regular expressions

编辑：哦，好吧，@ teepark先把它联系起来。：）

Answer 3

将beautifulsoup和正则表达式结合起来会给你类似的东西：

import BeautifulSoup
import re
thestring = r'<div class="adtxt">7616 W Belmont Ave<br />Chicago, IL 60634-3225</div>'
re0 = re.compile(r'(?P<address>[^<]+)')
re1 = re.compile(r'(?P<city>[^,]+), (?P<state>\w\w) (?P<zip>\d{5}-\d{4})')
soup = BeautifulSoup.BeautifulSoup(thestring)
(address,) = re0.search(soup.div.contents[0]).groups()
city, state, zip = re1.search(soup.div.contents[2]).groups()

Python字符串拆分

3 个答案: