我试图通过python请求(2.7)从公共Linkedin网址获取名称。
以前的代码工作正常。
import requests
from bs4 import BeautifulSoup
url = "https://www.linkedin.com/in/linustorvalds"
html = requests.get(url).content
link = BeautifulSoup(html).title.text.split("|")[0].replace(" ","")
print link
所需的输出是:
linustorvalds
我收到以下错误消息:
AttributeError: 'NoneType' object has no attribute 'text'
问题似乎是html没有返回页面的真实内容。所以没有找到“标题”。这是打印html的结果:
<html><head>
<script type="text/javascript">
window.onload = function() {
var newLocation = "";
if (window.location.protocol == "http:") {
var cookies = document.cookie.split("; ");
for (var i = 0; i < cookies.length; ++i) {
if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) {
newLocation = "https:" + window.location.href.substring(window.location.protocol.length);
}
}
}
if (newLocation.length == 0) {
var domain = location.host;
var newDomainIndex = 0;
if (domain.substr(0, 6) == "touch.") {
newDomainIndex = 6;
}
else if (domain.substr(0, 7) == "tablet.") {
newDomainIndex = 7;
}
if (newDomainIndex) {
domain = domain.substr(newDomainIndex);
}
newLocation = "https://" + domain + "/uas/login?trk=sentinel_org_block&session_redirect=" + encodeURIComponent(window.location)
}
window.location.href = newLocation;
}
</script>
</head></html>
我被封锁了吗?使这段代码像以前一样工作的可能建议是什么?
非常感谢!
答案 0 :(得分:0)
尝试设置User-Agent标头:
html = requests.get(url, headers={"User-Agent": "Requests"}).content