Question

我正在使用Django-scrapy创建用于抓取某些网站的应用。

这是我在Django上的views.py，它捕获POST值并在调用crawlit()时运行该脚本

from django.shortcuts import render
from django.http import HttpResponse as hres
from .MetaCrawl import InitCrawl
import json


def index( req ):
    return render( req, 'crawler/index.html' )


def crawlit( req ):
    val = req.POST['value'] #https://stackoverflow.com
    InitCrawler( val )
    return hres( json.dumps({'message':'Sent!'}) )

在同一view.py级别，我有一个名为MetaCrawl.py的文件

import scrapy
from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging


class Spider( scrapy.Spider ):
    name="project"
    links = []
    start_urls = links

    def __init__( self, l ):
        self.links.append( l )

    def parse( self, response ):
        for item in response.xpath('//a'):
            print(item)


class InitCrawl:
    def __init__( self, link ):
        configure_logging()
        crawler = CrawlerRunner()
        crawler.crawl( Spider )
        d = crawler.join()
        d.addBoth( lambda _: reactor.stop() )
        reactor.run()

我收到此错误：

builtins.ValueError：信号仅在主线程中起作用

有时我会收到此错误：

ValueError：信号仅在主线程中起作用

我遵循了本教程Scrapy Spiders，但我迷路了。我该怎么解决？

调用scrapy时内建错误-Django

0 个答案: