我试图从LI页面中删除一些值,我可以得到名字,教育,标题。我添加了profile picture
,summary
的代码但无法获取。
任何有用的提示都非常感激。
def getLinkedinData(self):
result = {}
driver = webdriver.PhantomJS('/usr/local/bin/phantomjs' ,service_args=service_args)
driver.set_window_size(1124, 850)
google_news_trends = []
driver.get("https://www.linkedin.com/in/joymerrillsti")
driver.page_source.encode("utf-8")
try:
print driver.find_element_by_class_name('full-name').text#
except:
pass
#This does not give link to profile picture
try:
img = driver.find_element_by_class_name('profile-picture')
for s in img:
print s
print s.find_element_by_tag_name('img').get_attribute('src')
except:
pass
try:
head = driver.find_element_by_id('headline-container')
print head.text
for s in head:
print s.find_element_by_tag_name('p').text
except:
pass
try:
location = driver.find_element_by_id('location-container')
for s in location:
print s.find_element_by_tag_name('a').text
except:
pass
#This does not give summary
try:
summary = driver.find_element_by_id('summary-item')
for s in summary:
print s.text
print s.find_element_by_tag_name('p').text
except:
pass
#This is fine, but is there any way to get only value for Education
try:
ed = driver.find_element_by_id('overview-summary-education') #Here how to get only education value?
print ed.text
except:
pass
答案 0 :(得分:1)
我首先会评估Linked API并查看它是否可以为您提供所需的信息。
如果你坚持网页抓取页面,我认为你只缺少一件事 - Explicit Wait等待页面加载:
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.PhantomJS('/usr/local/bin/phantomjs')
driver.get("https://www.linkedin.com/in/joymerrillsti")
driver.set_window_size(1124, 850)
wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "full-name")))
print driver.find_element_by_class_name('full-name').text
print driver.find_element_by_css_selector('div.profile-picture img').get_attribute('src')
print driver.find_element_by_id('headline-container').text
print driver.find_element_by_id('location-container').text
print driver.find_element_by_id('summary-item').text
print driver.find_element_by_id('overview-summary-education').text
我还清理了一些东西。
答案 1 :(得分:1)
你可以通过这种方式找到img:
注意:您可以使用get_attribute
函数找到元素的属性。
img = driver.find_element_by_class_name('profile-picture>a>img').get_attribute("src")
您可以通过以下方式找到摘要:
summary = driver.find_element_by_class_name('description').text