request()函数返回未定义的值

时间:2017-05-24 16:08:08

标签: javascript google-chrome-extension web-scraping cors web-crawler

因此,我正在制作一个Google Chrome扩展程序,只要我的所有课程都将新成绩发布到我的大学成绩单中,就会通知我,所以目前我正在尝试迭代抓取并抓取网址并将其与最后一次迭代(...?对此的建议将不胜感激!),目前当我使用request()函数时,即使使用异步,函数当前也会为响应和正文返回undefined,并且给了我另一个奇怪的东西如果我尝试console.log所有这些错误的错误。

这是我之后得到的错误:

bundle.js:24 Uncaught TypeError: Cannot read property 'headers' of undefined
    at Request._callback (bundle.js:24)
    at self.callback (bundle.js:54273)
    at Request.EventEmitter.emit (bundle.js:95413)
    at Request.start (bundle.js:54842)
    at Request.end (bundle.js:55610)
    at end (bundle.js:54652)
    at bundle.js:54666
    at Item.run (bundle.js:103974)
    at drainQueue (bundle.js:103944)

这是我的代码(更改了网址,因此您没有看到我校的登录网址):

var Crawler = require("simplecrawler"),
    url = require("url"),
    cheerio = require("cheerio"),
    request = require("request");

var initialURL = "https://www.fakeURL.com/";


var crawler = new Crawler(initialURL);

request("https://www.fakeURL.com/", {
    // The jar option isn't necessary for simplecrawler integration, but it's
    // the easiest way to have request remember the session cookie between this
    // request and the next
    jar: true,
    mode: 'no-cors'
}, function(error, response, body) {
    // Start by saving the cookies. We'll likely be assigned a session cookie
    // straight off the bat, and then the server will remember the fact that
    // this session is logged in as user "iamauser" after we've successfully
    // logged in

    crawler.cookies.addFromHeaders(response.headers["set-cookie"]);

    // We want to get the names and values of all relevant inputs on the page,
    // so that any CSRF tokens or similar things are included in the POST
    // request
    var $ = cheerio.load(body),
        formDefaults = {},
        // You should adapt these selectors so that they target the
        // appropriate form and inputs
        formAction = $("#login").attr("action"),
        loginInputs = $("input");

    // We loop over the input elements and extract their names and values so
    // that we can include them in the login POST request
    loginInputs.each(function(i, input) {
        var inputName = $(input).attr("name"),
            inputValue = $(input).val();

        formDefaults[inputName] = inputValue;
    });

    // Time for the login request!
    request.post(url.resolve(initialURL, formAction), {
        // We can't be sure that all of the input fields have a correct default
        // value. Maybe the user has to tick a checkbox or something similar in
        // order to log in. This is something you have to find this out manually
        // by logging in to the site in your browser and inspecting in the
        // network panel of your favorite dev tools what parameters are included
        // in the request.
        form: Object.assign(formDefaults, {
            username: "secretusername",
            password: "secretpassword"
        }),
        // We want to include the saved cookies from the last request in this
        // one as well
        jar: true
    }, function(error, response, body) {
        // That should do it! We're now ready to start the crawler
        crawler.interval = 10000 //600000 // 10 minutes
        crawler.maxConcurrency = 1; // 1 active check at a time
        crawler.maxDepth = 5;
        crawler.start();
    });
});

crawler.on("fetchcomplete", function(queueItem, responseBuffer, response) {
    console.log("Fetched", queueItem.url, responseBuffer.toString());
});

// crawler.interval = 600000 // 10 minutes
// crawler.maxConcurrency = 1; // 1 active check at a time
// crawler.maxDepth = 5;
//
// crawler.start();

有一点需要注意的是我在我的请求中添加了'no-cors'模式,因此每当我测试它时,我都可以停止使用CORS,但这可能是导致此问题的原因吗?

谢谢!

编辑:我正在使用Browserify在浏览器中使用require()内容。我无法发布bundle.js中的实际代码,因为它非常长并且不适合这里。只想澄清一下。谢谢!

EDIT2:这是我在尝试执行console.log时所给出的(错误):

Error: Invalid value for opts.mode
    at new module.exports (bundle.js:108605)
    at Object.http.request (bundle.js:108428)
    at Object.https.request (bundle.js:97056)
    at Request.start (bundle.js:54843)
    at Request.end (bundle.js:55613)
    at end (bundle.js:54655)
    at bundle.js:54669
    at Item.run (bundle.js:103977)
    at drainQueue (bundle.js:103947)

1 个答案:

答案 0 :(得分:0)

正如詹姆斯所说,如果您收到错误,请通过将其记录到控制台或您最喜欢的任何方法来检查错误以显示调试信息。

如果你得到Cannot read property 'headers' of undefined,正如你所说,response未定义,那么你的第一个回调线就会失败,因为它会尝试访问response.headers

这里简单调试的方法是console.log()错误,在到达有问题的行之前(因为它停在那里),所以你必须简单地添加console.log(error);作为第一行你的回调。

要走的路:

虽然您可以解决在console.log(error)中看到的问题,但此代码注定失败,因为您没有检查是否收到错误并假设请求已成功填写。网络连接是混乱的,并且请求可能由于很多原因而失败,因此在访问request.headers之前,您必须检查是否发生了任何错误并将其记录(或将其显示给您的客户端,在X秒后重试请求,无论你最喜欢什么。)

提示:如果您有带错误参数的回调,请检查它。是否有第一个参数是有原因的。

代码如下所示:

request("https://www.fakeURL.com/", {
    jar: true,
    mode: 'no-cors'
}, function(error, response, body) {
    if (error) {
        console.log(error);
        makeTheRequestAgainIn(5000); // Milliseconds
    } else {
        doWhateverWith(response, body);
    }
});

错误:

您无法在浏览器中禁用CORS。您可以在node.js中禁用它,因为它不是浏览器,这就是为什么在请求模块中有一个选项,但浏览器有一个原因的安全措施。如果他们可以避免,那么他们没有任何意义。

简而言之:是的,如果您没有在服务器中启用,则您遇到CORS问题。

Protip :在浏览器中处理JavaScript时,将开发人员工具打开(F12)是一个很好的做法,就像你一样,这样你就看到了自动登录的CORS错误控制台(或任何网络错误发生)。此外,切换到网络选项卡并检查请求标题,响应等也是一个很好的做法。

编辑:刚刚注意到Chrome扩展程序(dang)。扩展程序不受限制,因此可以执行这些调用,如下所示:https://developer.chrome.com/extensions/xhr

另外,检查request npm模块source code并且没有no-cors值。我认为您将Request APIrequest模块混合在一起。