Question

我想在服务器端爬行但是我的python它不是那么好......

我的源码工作得很好，如果我在mylaptop终端上运行它，但在服务器终端上运行它时出错了

这里是我的源代码

^(.+)(?=\.\d+$)

错误的说法

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from thehack.items import NowItem
import time

class MySpider(BaseSpider):
    name = "nowhere"
    allowed_domains = ["n0where.net"]
    start_urls = ["https://n0where.net/"]

    def parse(self, response):
        for article in response.css('.loop-panel'):
            item = NowItem()
            item['title'] = article.css('.article-title::text').extract_first()
            item['link'] = article.css('.loop-panel>a::attr(href)').extract_first()
            item['body'] ='' .join(article.css('.excerpt p::text').extract()).strip()
            #date ga kepake
            #item['date'] = article.css('[itemprop="datePublished"]::attr(content)').extract_first()
            yield item
            time.sleep(5)

有人知道如何修复它吗？非常感谢之前：）

Answer 1

好像你的scrapy版本已经过时了。 scrapy.Selector方法.extract_first()仅在scrapy 1.1中添加，因此您希望升级服务器上的scrapy包。

在终端服务器上运行Scrapy

1 个答案: