Question

收到以下错误：

追踪（最近的呼叫最后）：
  文件“stack.py”，第31行，在？
中       打印＆gt;＆gt; out，“％s”％escape（p）文件
  “/usr/lib/python2.4/cgi.py”，行
  1039，逃生中       s = s.replace（“＆amp;”，“＆amp;”）＃必须先完成！ TypeError：'NoneType'
  对象不可调用

以下代码：

import urllib2
from cgi import escape  # Important!
from BeautifulSoup import BeautifulSoup

def is_talk_anchor(tag):
return tag.name == "a" and tag.findParent("dt", "thumbnail")

def talk_description(tag):
return tag.name == "p" and tag.findParent("h3")

links = []
desc = []

for pagenum in xrange(1, 5):
soup = BeautifulSoup(urllib2.urlopen("http://www.ted.com/talks?page=%d" % pagenum))
links.extend(soup.findAll(is_talk_anchor))
page = BeautifulSoup(urllib2.urlopen("http://www.ted.com/talks/arvind_gupta_turning_trash_into_toys_for_learning.html"))
desc.extend(soup.findAll(talk_description))

out = open("test.html", "w")

print >>out, """<html><head><title>TED Talks Index</title></head>
<body>
<table>
<tr><th>#</th><th>Name</th><th>URL</th><th>Description</th></tr>"""

for x, a in enumerate(links):
  print >> out, "<tr><td>%d</td><td>%s</td><td>http://www.ted.com%s</td>" % (x + 1, escape(a["title"]), escape(a["href"]))

for y, p in enumerate(page):
  print >> out, "<td>%s</td>" % escape(p)

print >>out, "</tr></table>"

我认为问题出在% escape(p)上。我正试图把<p>的内容拿出来。我不应该使用逃生吗？

该行也存在问题：

page = BeautifulSoup(urllib2.urlopen("%s") % a["href"])

这就是我想做的事情，但又会遇到错误并想知道是否有其他方法可以做到这一点。只是尝试收集我从之前的行找到的链接并再次通过BeautifulSoup运行它。

Answer 1

你必须调查（使用pdb）为什么你的一个链接被返回为None instance。

特别是：追溯是自言自语。使用None调用escape（）。所以你必须调查哪个参数是None ......它是你在'links'中的一个项目。那么为什么你的一件商品没有？

可能是因为你打电话给

def is_talk_anchor(tag):
   return tag.name == "a" and tag.findParent("dt", "thumbnail")

返回None，因为tag.findParent（“dt”，“thumbnail”）返回None（由于给定的HTML输入）。

因此，您必须在“链接”中检查或过滤“无”的项目（或调整上面的解析器代码），以便根据您的需要仅提取现有链接。

请仔细阅读您的追溯并思考问题所在 - 追溯非常有用，并为您提供有关您问题的宝贵信息。

BeautifulSoup错误（CGI Escape）

1 个答案: