请求不再返回html - Python

时间:2015-04-19 13:47:31

标签: python html beautifulsoup python-requests

我试图通过python请求(2.7)从公共Linkedin网址获取名称。

以前的代码工作正常。

import requests
from bs4 import BeautifulSoup

url = "https://www.linkedin.com/in/linustorvalds"
html = requests.get(url).content

link = BeautifulSoup(html).title.text.split("|")[0].replace(" ","")
print link

所需的输出是:

linustorvalds

我收到以下错误消息:

AttributeError: 'NoneType' object has no attribute 'text'

问题似乎是html没有返回页面的真实内容。所以没有找到“标题”。这是打印html的结果:

<html><head>
<script type="text/javascript">
window.onload = function() {
  var newLocation = "";
  if (window.location.protocol == "http:") {
    var cookies = document.cookie.split("; ");
    for (var i = 0; i < cookies.length; ++i) {
      if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) {
        newLocation = "https:" + window.location.href.substring(window.location.protocol.length);
      }
    }
  }

  if (newLocation.length == 0) {
    var domain = location.host;
    var newDomainIndex = 0;
    if (domain.substr(0, 6) == "touch.") {
      newDomainIndex = 6;
    }
    else if (domain.substr(0, 7) == "tablet.") {
      newDomainIndex = 7;
    }
    if (newDomainIndex) {
      domain = domain.substr(newDomainIndex);
    }
    newLocation = "https://" + domain +  "/uas/login?trk=sentinel_org_block&session_redirect=" + encodeURIComponent(window.location)
  }
  window.location.href = newLocation;
}
</script>
</head></html>

我被封锁了吗?使这段代码像以前一样工作的可能建议是什么?

非常感谢!

1 个答案:

答案 0 :(得分:0)

尝试设置User-Agent标头:

html = requests.get(url, headers={"User-Agent": "Requests"}).content