我正在尝试使用node-horseman进行网络爬虫,这样可以更轻松地使用phantomJS。但我一度陷入困境。
我的代码的要点:
https://gist.github.com/matheus-rossi/bc4c688264be072ded4ff7ee3f933bc2.js
正如您所看到的,如果我在浏览器中运行完全相同的代码,一切正常,就像在这张图片中一样:
Code running OK in the browser
但是如果我在node-horseman中运行代码,我会得到这个:
Unhandled rejection eval@[native code]
evaluate
global code
evaluateJavaScript@[native code]
evaluate@phantomjs://platform/webpage.js:390:39
phantomjs://code/bridge.js:121:61 at Horseman.<anonymous>
(/home/matheus/Documentos/NodeJs/node-horseman/node_modules/node-
horseman/lib/actions.js:839:38)
at Horseman.tryCatcher (/home/matheus/Documentos/NodeJs/node-horseman/node_modules/bluebird/js/release/util.js:16:23)
at Promise._settlePromiseFromHandler (/home/matheus/Documentos/NodeJs/node-horseman/node_modules/bluebird/js/release/promise.js:512:31)
at Promise._settlePromise (/home/matheus/Documentos/NodeJs/node-horseman/node_modules/bluebird/js/release/promise.js:569:18)
at Promise._settlePromiseCtx (/home/matheus/Documentos/NodeJs/node-horseman/node_modules/bluebird/js/release/promise.js:606:10)
at Async._drainQueue (/home/matheus/Documentos/NodeJs/node-horseman/node_modules/bluebird/js/release/async.js:138:12)
at Async._drainQueues (/home/matheus/Documentos/NodeJs/node-horseman/node_modules/bluebird/js/release/async.js:143:10)
at Immediate.Async.drainQueues (/home/matheus/Documentos/NodeJs/node-horseman/node_modules/bluebird/js/release/async.js:17:14)
at runCallback (timers.js:781:20)
at tryOnImmediate (timers.js:743:5)
at processImmediate [as _immediateCallback] (timers.js:714:5)
var Horseman = require('node-horseman')
var horseman = new Horseman()
horseman
.open('http://www.angeloni.com.br/super/index')
.status()
.evaluate(function(){
const descNode = document.querySelectorAll('.descr a')
const desc = Array.prototype.map.call(descNode, function (t) { return t.textContent })
const valueNode = document.querySelectorAll('.price a')
const value = Array.prototype.map.call(valueNode, function (t) { return t.textContent })
const finalData = []
for (let i=0 ; i < desc.length; i ++) {
let item = {}
item['desc'] = desc[i]
item['value'] = value[i]
finalData.push(item)
}
return finalData
})
.then(function(finalData){
console.log(finalData)
})
.close()
编辑 - 在承诺中包含.catch后,获得了这个新信息:
message: 'Expected an identifier but found \'item\' instead',
答案 0 :(得分:1)
你缺少的是phantom.js在不同于节点的环境中运行javascript。像许多浏览器一样,并非所有优秀的es6语言功能都可用于此环境(尚未)。
如果我运行你的代码,我会使用let
从phantom.js中收到错误。将这些更改为var
会使您的代码适合我。
此外,在承诺之后添加.catch()
是个好主意,因为这样你就会得到更好的错误,这可能在这种情况下很有用。