使用Python Beautifulsoup从LinkedIn收集数据

时间:2019-02-27 18:33:08

标签: python beautifulsoup linkedin

我正在尝试使用python beautifulsoup模块导出我的LinkedIn联系人姓名。我的代码如下:

import requests
from bs4 import BeautifulSoup

client = requests.Session()

HOMEPAGE_URL = 'https://www.linkedin.com'
LOGIN_URL = 'https://www.linkedin.com/uas/login-submit'
CONNECTIONS_URL = 'https://www.linkedin.com/mynetwork/invite-connect/connections/'

html = client.get(HOMEPAGE_URL).content
soup = BeautifulSoup(html, "html.parser")
csrf = soup.find(id="loginCsrfParam-login")['value']

login_information = {
    'session_key':'username',
    'session_password':'password',
    'loginCsrfParam': csrf,
}
try:
    client.post(LOGIN_URL, data=login_information)
    print "Login Successful"
except:
    print "Failed to Login"

html = client.get(CONNECTIONS_URL).content
soup = BeautifulSoup(html , "html.parser")
print soup.find_all('div', attrs={'class' : 'mn-connection-card__name'})

但是问题是我总是得到一个空列表。像下面这样:

Login Successful
[]

一个html结构是这样的:

<span class="mn-connection-card__name t-16 t-black t-bold">
      Sombody's name
    </span>

我认为我应该更改我的soup.x方法。我使用了find,select,find_all,但没有成功。

谢谢

2 个答案:

答案 0 :(得分:0)

如果您要提取名称,则只需

from bs4 import BeautifulSoup
soup = BeautifulSoup(html , "html.parser")
target = soup.find_all('span', attrs={'class' : 'mn-connection-card__name'})
target[0].text.strip()

输出

"Sombody's name"

答案 1 :(得分:0)

我知道我要参加聚会晚了,但这现在适用于linkedin:

import requests
from bs4 import BeautifulSoup

#create a session
client = requests.Session()

#create url page variables
HOMEPAGE_URL = 'https://www.linkedin.com'
LOGIN_URL = 'https://www.linkedin.com/uas/login-submit'
CONNECTIONS_URL = 'https://www.linkedin.com/mynetwork/invite-connect/connections/'
ASPIRING_DATA_SCIENTIEST = 'https://www.linkedin.com/search/results/people/?keywords=Aspiring%20Data%20Scientist&origin=GLOBAL_SEARCH_HEADER'

#get url, soup object and csrf token value
html = client.get(HOMEPAGE_URL).content
soup = BeautifulSoup(html, "html.parser")
csrf = soup.find('input', dict(name='loginCsrfParam'))['value']

#create login parameters
login_information = {
    'session_key':'your_email',
    'session_password':'your_password',
    'loginCsrfParam': csrf,
}

#try and login
try:
    client.post(LOGIN_URL, data=login_information)
    print("Login Successful")
except:
    print("Failed to Login")

#open the html with soup object
# html = client.get(CONNECTIONS_URL).content #opens connections_url
html = client.get(ASPIRING_DATA_SCIENTIEST).content #opens ASPIRING_DATA_SCIENTIEST
soup = BeautifulSoup(html , "html.parser")
# print(soup.find_all('div', attrs={'class' : 'mn-connection-card__name'}))

# print(soup)
print(soup.prettify())