Question

当我使用选择器进行选择时，在使用Scrapy 1.6爬行页面https://github.com/rg3/youtube-dl/pull/11272时：

// div [@class ='file js-comment-container   js-resolvable-timeline-thread-container']

如果您正在浏览器或其他工具上使用它，请不要忘记阻止JS。

结果将是没有属性“ extract_first”的东西。

例如，运行此代码将面临该错误：

for code_and_comment in response.xpath(
        "//div[@class = 'file js-comment-container js-resolvable-timeline-thread-container']"):
    if code_and_comment is None:
        print('it is NONE')
    print(code_and_comment.extract_first())

我听不懂，你知道我在哪里错吗？预先感谢。

注意：是的，我知道 robot.text 甚至是ROBOTSTXT_OBEY = False
注意2：：我认为动态JavaScript并不是问题，我已经尝试在我的浏览器中将Xpath与 JavaScript禁用一起使用，并且效果很好。

Answer 1

原因是在您的代码code_and_comment中已经是一个单个选择器，因此拥有extract_first毫无意义。它仅适用于选择器列表（这是您从response.xpath(...)获得的信息）。

您可以执行以下操作：

for code_and_comment in response.xpath(
        "//div[@class = 'file js-comment-container js-resolvable-timeline-thread-container']"):
    if code_and_comment is None:
        print('it is NONE')
    print(code_and_comment.extract())

Scrapy AttributeError：“选择器”对象没有属性“ extract_first”

1 个答案: