Question

我正在尝试使用 scrapyrt 运行scrapy crawler。我在浏览器中得到以下回复

 {"status": "error", "message": "", "code": 500} response: 1

和scrapyrt窗口中的这个

我曾尝试编辑日志文件的路径，但它会抛出Permission denied错误。

抓取工具成功运行（因为它创建了html文件）但未在 curl 中接收json响应。

    $curl = curl_init();
    curl_setopt_array($curl, array(
    CURLOPT_PORT=>'9080',
    CURLOPT_URL => "http://localhost/crawl.json?spider_name=dmoz&url=http://www.dmoz.org/Computers/Programming/Languages/Ada/",
    CURLOPT_FOLLOWLOCATION => true,   
    CURLOPT_MAXREDIRS      => 10,      
    CURLOPT_USERAGENT     => $_SERVER['HTTP_USER_AGENT'],
    CURLOPT_AUTOREFERER    => true,   
    CURLOPT_CONNECTTIMEOUT => 120,    
    CURLOPT_TIMEOUT        => 120,   
    CURLOPT_POST           => false
          ));
    $response = curl_exec($curl);
    $err = curl_error($curl);

    curl_close($curl);

    if ($err) { echo "cURL Error #:" . $err; } 
    else { echo "response: ".$response; }

如果从 scapy cmd执行相同的抓取工具 scrapy crawl dmoz -a url="http://www.dmoz.org/Computers/Programming/Languages/Ada/"

输出是

{'description': u'ACM Special Interest Group on Ada: information on SIGAda organization and pointers to current information and resources for the Ada programming language.', 'name': u'SIGAda', 'url': u'http://www.sigada.org/'}

Answer 1

解决了这个问题：

更新了＆＃34; C：\ Python27 \ Lib \ site-packages \ scrapyrt \ log.py＆＃34;文件如下。

替换

filename = settings.get('LOG_FILE')

用这个

filename = "C:\\wamp64\\www\\dirbot-master\\logs\\dmoz\\log.log"

dirbot-master是scrapy项目。现在我在浏览器中收到回复。

scrapyrt没有接受scrapy履带的响应

1 个答案: