Question

我正在制作一个将JsonResponse作为我的文本从草书中返回的api。当我单独运行脚本时，它可以完美运行。但是当我尝试将scrapy脚本与python django集成时，我没有得到输出。

我想要的只是将响应返回给请求（在我的情况下是POSTMAN POST请求。

这是我正在尝试的代码

from django.http import HttpResponse, JsonResponse
from django.views.decorators.csrf import csrf_exempt
import scrapy
from scrapy.crawler import CrawlerProcess


@csrf_exempt
def some_view(request, username):
    process = CrawlerProcess({
        'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
        'LOG_ENABLED': 'false'
    })
    process_test = process.crawl(QuotesSpider)
    process.start()

    return JsonResponse({'return': process_test})


class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        urls = [
            'http://quotes.toscrape.com/random',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        return response.css('.text::text').extract_first()

我对python和django的东西很陌生。任何帮助将不胜感激。

Answer 1

在您的代码中，process_test是CrawlerProcess，而不是爬网的输出。

您需要进行其他配置，以使蜘蛛程序将其输出存储在“某处” 。有关编写自定义管道的信息，请参见this SO Q&A。

如果您只想同步检索和解析单个页面，最好使用requests来检索页面，然后使用parsel来分析页面。

信号仅在主线程中有效：scrappy

1 个答案: