尝试抓取网络时避免错误

时间:2016-03-16 09:59:19

标签: python web-crawler urllib

我正在尝试'在python'中抓取网页。问题是我的re模块出错了。这是我的代码:

#Python27

import re
import urllib

htm = urllib.urlopen('http://www.4shared.com/')
wp = htm.read()
begin = []
for start_tag in wp:
    x = re.search('<a', wp)
    begin.append(x.end())

这里出现错误信息:

Traceback (most recent call last):
  File "E:/Python Essen/Python27/crawling_web_pratice.py", line 23, in <module>
    a = re.search(start_tag, wp)
  File "C:\Python27\lib\re.py", line 146, in search
    return _compile(pattern, flags).search(string)
  File "C:\Python27\lib\re.py", line 251, in _compile
    raise error, v # invalid expression
error: nothing to repeat

请问如何避免错误?

0 个答案:

没有答案