Question

我使用Scrapy Crawler提取一些详细信息，如用户名，upvotes，加入日期等。

我使用XPath从每个用户的网页中提取内容。

代码：

import scrapy
from scrapy.selector import HtmlXPathSelector
from scrapy.http import Request
from scrapy.spiders import BaseSpider
from scrapy.http import FormRequest
from loginform import fill_login_form
from scrapy.selector import Selector
from scrapy.http import HtmlResponse

class UserSpider(scrapy.Spider):
    name = 'userspider'
    start_urls = ['http://forum.nafc.org/login/']
    #Getting the list of usernames
    user_names = ['Bob', 'Tom']  #List of Usernames

    def __init__(self, *args, **kwargs):
        super(UserSpider, self).__init__(*args, **kwargs)

    def parse(self, response):
        return [FormRequest.from_response(response,
                    formdata={'registerUserName': 'user', 'registerPass': 'password'},
                    callback=self.after_main_login)]

    def after_main_login(self, response):
        for user in self.user_names:
            user_url = 'profile/' + user
            yield response.follow(user_url, callback=self.parse_user_pages)

    def parse_user_pages(self, response):
        yield{
            "USERNAME": response.xpath('//div[contains(@class, "main") and contains(@class, "no-sky-main")]/h1[contains(@class, "thread-title")]/text()').extract_first()
            "UPVOTES": response.xpath('//div[contains(@class, "proUserInfoLabelLeft") and @id="proVotesCap"]/text()').extract()[0]
        }

if __name__ == "__main__":
    spider = UserSpider()

Error looks like this

P.S。我已经在Scrapy Shell上手动检查了我的XPath的语法，它工作正常

我在代码中没有注意到什么吗？

Answer 1

您在第一个dict元素之后错过了,：

{"USERNAME": response.xpath(...).extract_first(),
 "UPVOTES": response.xpath(...).extract()[0]}

在Scrapy中给出语法错误（Python） - XPath

1 个答案: