我正在尝试从LinkedIn帐户中获取公司信息,但是我无法获取任何内部内容。你能告诉我怎么了吗?
我需要获得
company
website
industry
employes
etc.
但是我不能。我收到的唯一的html显示如下:
代码:
import requests
import webbrowser,html5lib
from bs4 import BeautifulSoup
linkdine_company_about=requests.get('https://www.linkedin.com/company/exxonmobil')
html=BeautifulSoup(linkdine_company_about.text,'html.parser')
print(html)
运行:
<pre>
exxonmobil
https://www.linkedin.com/company/exxonmobil
<html><head>
<script type="text/javascript">
window.onload = function () {
// Parse the tracking code from cookies.
var trk = "bf";
var trkInfo = "bf";
var cookies = document.cookie.split("; ");
for (var i = 0; i < cookies.length; ++i) {
if ((cookies[i].indexOf("trkCode=") == 0) && (cookies[i].length > 8)) {
trk = cookies[i].substring(8);
} else if ((cookies[i].indexOf("trkInfo=") == 0) && (cookies[i].length > 8)) {
trkInfo = cookies[i].substring(8);
}
}
if (window.location.protocol == "http:") {
// If "sl" cookie is set, redirect to https.
for (var i = 0; i < cookies.length; ++i) {
if ((cookies[i].indexOf("sl=") == 0) && (cookies[i].length > 3)) {
window.location.href = "https:" +
window.location.href.substring(window.location.protocol.length);
return;
}
}
}
// Get the new domain. For international domains such as
// fr.linkedin.com, we convert it to www.linkedin.com
var domain = "www.linkedin.com";
if (domain != location.host) {
var subdomainIndex = location.host.indexOf(".linkedin");
if (subdomainIndex != -1) {
domain = "www" + location.host.substring(subdomainIndex);
}
}
window.location.href = "https://" + domain + "/authwall?trk=" + trk + "&trkInfo=" + trkInfo +
"&originalReferer=" + document.referrer.substr(0, 200) +
"&sessionRedirect=" + encodeURIComponent(window.location.href);
}
</script>
</head></html>
***
Process finished with exit code 0
</pre>
答案 0 :(得分:0)
您只需要通过headers
,就可以在这里通过。
别忘了用自己的
。Cookie
代替
import requests
from bs4 import BeautifulSoup
headers = {
'Host': 'www.linkedin.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'Cookie': '', # replace with your own cookies.
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'max-age=0',
'TE': 'Trailers'
}
r = requests.get(
'https://www.linkedin.com/company/exxonmobil', headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.prettify)