我正在尝试使用python beautifulsoup模块导出我的LinkedIn联系人姓名。我的代码如下:
import requests
from bs4 import BeautifulSoup
client = requests.Session()
HOMEPAGE_URL = 'https://www.linkedin.com'
LOGIN_URL = 'https://www.linkedin.com/uas/login-submit'
CONNECTIONS_URL = 'https://www.linkedin.com/mynetwork/invite-connect/connections/'
html = client.get(HOMEPAGE_URL).content
soup = BeautifulSoup(html, "html.parser")
csrf = soup.find(id="loginCsrfParam-login")['value']
login_information = {
'session_key':'username',
'session_password':'password',
'loginCsrfParam': csrf,
}
try:
client.post(LOGIN_URL, data=login_information)
print "Login Successful"
except:
print "Failed to Login"
html = client.get(CONNECTIONS_URL).content
soup = BeautifulSoup(html , "html.parser")
print soup.find_all('div', attrs={'class' : 'mn-connection-card__name'})
但是问题是我总是得到一个空列表。像下面这样:
Login Successful
[]
一个html结构是这样的:
<span class="mn-connection-card__name t-16 t-black t-bold">
Sombody's name
</span>
我认为我应该更改我的soup.x方法。我使用了find,select,find_all,但没有成功。
谢谢
答案 0 :(得分:0)
如果您要提取名称,则只需
from bs4 import BeautifulSoup
soup = BeautifulSoup(html , "html.parser")
target = soup.find_all('span', attrs={'class' : 'mn-connection-card__name'})
target[0].text.strip()
输出
"Sombody's name"
答案 1 :(得分:0)
我知道我要参加聚会晚了,但这现在适用于linkedin:
import requests
from bs4 import BeautifulSoup
#create a session
client = requests.Session()
#create url page variables
HOMEPAGE_URL = 'https://www.linkedin.com'
LOGIN_URL = 'https://www.linkedin.com/uas/login-submit'
CONNECTIONS_URL = 'https://www.linkedin.com/mynetwork/invite-connect/connections/'
ASPIRING_DATA_SCIENTIEST = 'https://www.linkedin.com/search/results/people/?keywords=Aspiring%20Data%20Scientist&origin=GLOBAL_SEARCH_HEADER'
#get url, soup object and csrf token value
html = client.get(HOMEPAGE_URL).content
soup = BeautifulSoup(html, "html.parser")
csrf = soup.find('input', dict(name='loginCsrfParam'))['value']
#create login parameters
login_information = {
'session_key':'your_email',
'session_password':'your_password',
'loginCsrfParam': csrf,
}
#try and login
try:
client.post(LOGIN_URL, data=login_information)
print("Login Successful")
except:
print("Failed to Login")
#open the html with soup object
# html = client.get(CONNECTIONS_URL).content #opens connections_url
html = client.get(ASPIRING_DATA_SCIENTIEST).content #opens ASPIRING_DATA_SCIENTIEST
soup = BeautifulSoup(html , "html.parser")
# print(soup.find_all('div', attrs={'class' : 'mn-connection-card__name'}))
# print(soup)
print(soup.prettify())