Question

TypeError：预期的字符串或缓冲区

from BeautifulSoup import BeautifulSoup
import urllib2
import re
html_page = urllib2.urlopen("http://kteq.in/services")
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
   result = re.sub(r"http\S+", "", link.get('href'))
   print result
   print "____________________________________________________"

运行上面的代码时，它在第7行显示TypeError。无法纠正错误。请建议我。

Answer 1

尝试打印href值。

for link in soup.findAll('a'):
    print(link.get('href'))
    result = re.sub(r"http\S+", "", link.get('href'))

您将看到提取少量链接后出现一个None值。

您可以通过在循环内提供if条件来解决此问题

for link in soup.findAll('a'):
    print(link.get('href'))
    if link.get('href')==None:
        continue
    result = re.sub(r"http\S+", "", link.get('href'))

TypeError：Python中预期的字符串或缓冲区

1 个答案: