Question

我试图通过python请求（2.7）从公共Linkedin网址获取名称。

以前的代码工作正常。

import requests
from bs4 import BeautifulSoup

url = "https://www.linkedin.com/in/linustorvalds"
html = requests.get(url).content

link = BeautifulSoup(html).title.text.split("|")[0].replace(" ","")
print link

所需的输出是：

linustorvalds

我收到以下错误消息：

AttributeError: 'NoneType' object has no attribute 'text'

问题似乎是html没有返回页面的真实内容。所以没有找到“标题”。这是打印html的结果：

<html><head>
<script type="text/javascript">
window.onload = function() {
  var newLocation = "";
  if (window.location.protocol == "http:") {
    var cookies = document.cookie.split("; ");
    for (var i = 0; i < cookies.length; ++i) {
      if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) {
        newLocation = "https:" + window.location.href.substring(window.location.protocol.length);
      }
    }
  }

  if (newLocation.length == 0) {
    var domain = location.host;
    var newDomainIndex = 0;
    if (domain.substr(0, 6) == "touch.") {
      newDomainIndex = 6;
    }
    else if (domain.substr(0, 7) == "tablet.") {
      newDomainIndex = 7;
    }
    if (newDomainIndex) {
      domain = domain.substr(newDomainIndex);
    }
    newLocation = "https://" + domain +  "/uas/login?trk=sentinel_org_block&session_redirect=" + encodeURIComponent(window.location)
  }
  window.location.href = newLocation;
}
</script>
</head></html>

我被封锁了吗？使这段代码像以前一样工作的可能建议是什么？

非常感谢！

Answer 1

尝试设置User-Agent标头：

html = requests.get(url, headers={"User-Agent": "Requests"}).content

请求不再返回html - Python

1 个答案: