使用 Selenium 抓取 LinkedIn 个人资料信息

时间:2021-02-03 14:57:16

标签: python selenium selenium-webdriver web-scraping

我正在尝试从 LinkedIn 抓取个人资料,我从下面的代码中获取个人资料 URL 并希望将其传递给 driver.get(URL),但是当我抓取 URL 时,URL 的格式是不同的,例如它在 [ ] 括号,我收到此错误

<块引用>

selenium.common.exceptions.InvalidArgumentException:消息:无效 参数:'url' 必须是字符串

能否请您建议如何在列表 linklist = [ ] 中获取正确格式的 URL,以便我可以将它们传递给 driver.get(URL)。谢谢!

options = Options()
options.add_argument("--start-maximized")
options.headless = True


url = "https://www.linkedin.com/login?fromSignIn=true&trk=guest_homepage-basic_nav-header-signin"
driver = webdriver.Chrome(path, options=options)

driver.get(url)
driver.find_element_by_id('username').send_keys('name')
driver.find_element_by_id('password').send_keys('password', Keys.ENTER)
driver.implicitly_wait(10)
driver.find_element_by_class_name('search-global-typeahead__input').send_keys('Marketing manager', Keys.ENTER)
driver.implicitly_wait(10)
driver.find_element_by_xpath('//button[text()="People"]').click()


x = 0
profile = []
linklist = []
condition = True
while condition:
    sleep(2)
    driver.execute_script("window.scrollTo(0, 1400);")
    driver.implicitly_wait(10)
    linkedin_members = driver.find_elements_by_xpath('//span[@class="entity-result__title"]')
    links = [linkedin_member.find_element_by_xpath('.//a[@class="app-aware-link"]').get_attribute('href') for linkedin_member in linkedin_members if "/in/" in linkedin_member.find_element_by_xpath('.//a[@class="app-aware-link"]').get_attribute('href')]

    x = x + 1
    linklist.append(link for link in links)
    driver.implicitly_wait(10)
    driver.find_element_by_xpath("""//button[@class='artdeco-pagination__button artdeco-pagination__button--next artdeco-button artdeco-button--muted artdeco-button--icon-right artdeco-button--1 artdeco-button--tertiary ember-view' and contains(.,'Next')]""").click()
    if x == 2:
        condition = False

profile = []

for l in tqdm(linklist):
    driver.get(l)

1 个答案:

答案 0 :(得分:0)

我使用了你使用的 while 循环的 for 循环,因为没有变量条件,你只想循环两次。

您可以这样做:

linklist = []
for i in range(2):
    time.sleep(2)
    driver.execute_script("window.scrollTo(0, 1400);")
    driver.implicitly_wait(10)
    linkedin_members = driver.find_elements_by_xpath('//span[@class="entity-result__title"]')
    
    link = driver.find_element_by_class_name('app-aware-link').get_attribute('href')
    linklist.append(link)
    driver.implicitly_wait(10)
    driver.find_element_by_xpath("""//button[@class='artdeco-pagination__button artdeco-pagination__button--next artdeco-button artdeco-button--muted artdeco-button--icon-right artdeco-button--1 artdeco-button--tertiary ember-view' and contains(.,'Next')]""").click()

for url in linklist:
    driver.get(url)

我搜索了包含个人资料网址的类并使用“.get_attribute('href')”来提取网址。