Question

import re
import urllib
p = urllib.urlopen("http://sprunge.us/QZhU")
page = p.read()
pos = page.find("<h2><span>")
print page[pos:pos+48]
c = re.compile(r'<h2><span>(.*)</span>')
print c.match(page).group(1)

当我运行它时：

shadyabhi@archlinux $ python2 temp.py 
<h2><span>House.S08E02.HDTV.XviD-LOL.avi</span> 
Traceback (most recent call last):
  File "temp.py", line 8, in <module>
    print c.match(page).group(1)
AttributeError: 'NoneType' object has no attribute 'group'
shadyabhi@archlinux $

如果我可以使用string.find找到一个字符串，那么当我使用正则表达式时会出现什么问题。我试过看http://docs.python.org/howto/regex.html#regex-howto但没有帮助。

Answer 1

match仅匹配字符串开头的。使用search，finditer或findall。

另请注意*贪婪。您可能希望将正则表达式更改为r'<h2><span>(.*?)</span>'。

总之，以下内容适用于我：

import re import urllib p = urllib.urlopen("http://sprunge.us/QZhU") page = p.read() pos = page.find("<h2><span>") print page[pos:pos+48] c = re.compile(r'<h2><span>(.*?)</span>') print c.search(page).group(1)

使用正则表达式无法从字符串中查找数据，而string.find（）工作得很好

1 个答案: