我正在尝试使用 scrapyrt 运行scrapy crawler。我在浏览器中得到以下回复
{"status": "error", "message": "", "code": 500} response: 1
和scrapyrt窗口中的这个
我曾尝试编辑日志文件的路径,但它会抛出Permission denied错误。
抓取工具成功运行(因为它创建了html文件)但未在 curl 中接收json响应。
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_PORT=>'9080',
CURLOPT_URL => "http://localhost/crawl.json?spider_name=dmoz&url=http://www.dmoz.org/Computers/Programming/Languages/Ada/",
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_MAXREDIRS => 10,
CURLOPT_USERAGENT => $_SERVER['HTTP_USER_AGENT'],
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_POST => false
));
$response = curl_exec($curl);
$err = curl_error($curl);
curl_close($curl);
if ($err) { echo "cURL Error #:" . $err; }
else { echo "response: ".$response; }
如果从 scapy cmd执行相同的抓取工具 scrapy crawl dmoz -a url="http://www.dmoz.org/Computers/Programming/Languages/Ada/"
输出是
{'description': u'ACM Special Interest Group on Ada: information on SIGAda organization and pointers to current information and resources for the Ada programming language.',
'name': u'SIGAda',
'url': u'http://www.sigada.org/'}
答案 0 :(得分:1)
解决了这个问题:
更新了" C:\ Python27 \ Lib \ site-packages \ scrapyrt \ log.py"文件如下。
替换
filename = settings.get('LOG_FILE')
用这个
filename = "C:\\wamp64\\www\\dirbot-master\\logs\\dmoz\\log.log"
dirbot-master是scrapy项目。 现在我在浏览器中收到回复。