使用selenium获取不同的属性值

时间:2015-07-22 13:36:45

标签: python selenium selenium-webdriver web-scraping

我试图从LI页面中删除一些值,我可以得到名字,教育,标题。我添加了profile picturesummary的代码但无法获取。

任何有用的提示都非常感激。

def getLinkedinData(self):
    result = {}
    driver = webdriver.PhantomJS('/usr/local/bin/phantomjs' ,service_args=service_args)
    driver.set_window_size(1124, 850)
    google_news_trends = []
    driver.get("https://www.linkedin.com/in/joymerrillsti")
    driver.page_source.encode("utf-8")
    try:
        print driver.find_element_by_class_name('full-name').text#
    except:
        pass
    #This does not give link to profile picture
    try:
        img = driver.find_element_by_class_name('profile-picture')
        for s in img:
            print s
            print s.find_element_by_tag_name('img').get_attribute('src')
    except:
        pass

    try:
        head = driver.find_element_by_id('headline-container')
        print head.text
        for s in head:
            print s.find_element_by_tag_name('p').text
    except:
        pass

    try:
        location = driver.find_element_by_id('location-container')
        for s in location:
            print s.find_element_by_tag_name('a').text
    except:
        pass
    #This does not give summary
    try:
        summary = driver.find_element_by_id('summary-item')
        for s in summary:
            print s.text
            print s.find_element_by_tag_name('p').text
    except:
        pass
    #This is fine, but is there any way to get only value for Education
    try:
        ed = driver.find_element_by_id('overview-summary-education') #Here how to get only education value?
        print ed.text
    except:
        pass

2 个答案:

答案 0 :(得分:1)

我首先会评估Linked API并查看它是否可以为您提供所需的信息。

如果你坚持网页抓取页面,我认为你只缺少一件事 - Explicit Wait等待页面加载:

from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.PhantomJS('/usr/local/bin/phantomjs')
driver.get("https://www.linkedin.com/in/joymerrillsti")
driver.set_window_size(1124, 850)

wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "full-name")))

print driver.find_element_by_class_name('full-name').text
print driver.find_element_by_css_selector('div.profile-picture img').get_attribute('src')
print driver.find_element_by_id('headline-container').text
print driver.find_element_by_id('location-container').text
print driver.find_element_by_id('summary-item').text
print driver.find_element_by_id('overview-summary-education').text

我还清理了一些东西。

答案 1 :(得分:1)

你可以通过这种方式找到img:

注意:您可以使用get_attribute函数找到元素的属性。

img = driver.find_element_by_class_name('profile-picture>a>img').get_attribute("src")

您可以通过以下方式找到摘要:

summary = driver.find_element_by_class_name('description').text