按照youtube上的教程: Scraping Web Pages with Scrapy
它已经老了,对于Python 2.x而且我正在学习3.x版本。到目前为止,我遇到了一些我已经能够通过谷歌解决的问题。但是目前我收到了一个错误:
文件“/usr/lib64/python3.5/site-packages/twisted/internet/defer.py”,第653行,在_runCallbacks中current.result = callback(current.result,* args,** kw)File “/home/skeer/PycharmProjects/scrape_craigslists/scrape_cl/scrape_cl/spiders/scrape.py”,第11行,解析xpath = scrapy.selector(response)TypeError:'module'对象不可调用
早先用谷歌搜索我发现对其他人的引用是由于一个非大写的字符,好像选择器中的's'应该是大写的。试了一下,并且遇到了关于如何找不到scrapy.Selector模块的错误。
这是我的代码:
from scrapy.spider import Spider
import scrapy.selector
class MySpider(Spider):
name = "craigslist"
allowed_domains = ["craigslist.org"]
start_urls = ["https://helena.craigslist.org/search/sad"]
def parse(self, response):
xpath = scrapy.selector(response)
titles = xpath.select("//p")
for titles in titles:
title = xpath("/body/section/form/div/li/p[@class]()").extract()
link =
xpath("/body/section/form/div/ul/li/a[@href]").extract()
print (title, link)
答案 0 :(得分:1)
我建议您使用official docs和curated resources进行学习。
对于您的问题,请查看official docs for Scrapy Selectors:
from scrapy.selector import Selector
class MySpider(Spider):
...
def parse(self, response):
xpath = Selector(response)
...
答案 1 :(得分:0)
scrapy.selector是包含选择器的模块。尝试
from scrapy.selector import Selector
但是,这不是必需的,因为响应对象已经有selector interface and an xpath method,所以你应该这样做:
def parse(self, response):
xpath = response.xpath
titles = xpath("//p")
for titles in titles:
title = xpath("/body/section/form/div/li/p[@class]()").extract()
link = xpath("/body/section/form/div/ul/li/a[@href]").extract()
print (title, link)
另外,如果您计划刮取craigslist,则需要一个非常好的代理列表。他们迅速禁止ip,特别是为了防止刮擦。
答案 2 :(得分:0)
更改功能定义:
def parse(self, response):
xpath = scrapy.selector.Selector(response)
titles = xpath.select("//p")
for titles in titles:
title = xpath.xpath("/body/section/form/div/li/p[@class]()").extract()
link = xpath.xpath("/body/section/form/div/ul/li/a[@href]").extract()
print(title, link)
注意xpath("/body/section/form/div/li/p[@class]()")
- > xpath.xpath("/body/section/form/div/li/p[@class]()")