Google表格:网络抓取受密码保护的数据

时间:2018-12-10 18:35:46

标签: login passwords screen-scraping

我正在尝试从www.investing.com下载数据表,只有我登录后才能访问(免费数据)。我尝试了在网上找到的各种示例代码,以使其正常工作,但仍然失败。该代码不会引发任何错误,但是当我检查日志时会显示“未登录”。

(用户名中的****应该是@ gmail.com)((我不知道原始代码中的“Přihlásitse”部分指的是什么,所以我暂时将其注释掉了))

function fetchAdminPage() {
   var url = "https://www.investing.com/";
   var options = {
      "method": "post",
      "payload": {
        'username': 'rudysemail****',
        'password': 'Honey20!7',
     //   'send': 'Přihlásit se',
        '_do': 'signIn-submit',
        "testcookie": 1
      },
      "followRedirects": true
   };
   var response = UrlFetchApp.fetch(url, options);
   if ( response.getResponseCode() == 200 ) {
     // Incorrect user/pass combo
     Logger.log("didnt log in");
   } else if ( response.getResponseCode() == 302 ) {
     // Logged-in
     var headers = response.getAllHeaders();
     if ( typeof headers['Set-Cookie'] !== 'undefined' ) {
        // Make sure that we are working with an array of cookies
        var cookies = typeof headers['Set-Cookie'] == 'string' ? [ headers['Set-Cookie'] ] : headers['Set-Cookie'];
        for (var i = 0; i < cookies.length; i++) {
           // We only need the cookie's value - it might have path, expiry time, etc here
           cookies[i] = cookies[i].split( ';' )[0];
        };
        url = "https://www.investing.com/portfolio/?portfolioID=MzNlNjNtZDk0Zm9gN2I%3D";
        options = {
            "method": "get",
            // Set the cookies so that we appear logged-in
            "headers": {
               "Cookie": cookies.join(';')
            }
        };
        response = UrlFetchApp.fetch(url, options);
     };
     Logger.log(response.getContentText()); 
   };     


}

我已经为此工作了好几个小时,如果有人可以帮助的话,那太好了!另外,假设问题得到解决,将数据写入电子表格的功能是什么?

谢谢

0 个答案:

没有答案