我正在尝试从LinkedIn上的知名人士那里抓取一些数据,但我遇到了一些问题。我想执行以下操作:
RSelenium
登录并“单击”“显示1项更多的知识”-以及“显示1项更多的经验”(请注意,Hadley会这样做)不能选择“显示1个更多的经验”,但是可以选择“显示1个更多的教育”)。
(通过点击“显示更多的经验/教育”,我可以从页面上获取全部的教育和经验)。另外,特德·克鲁兹(Ted Cruz)可以选择“展示5个更多的体验”,我想扩展和抓取。代码:
library(RSelenium)
library(rvest)
library(stringr)
library(xml2)
userID = "myEmailLogin" # The linkedIn email to login
passID = "myPassword" # and LinkedIn password
try(rsDriver(port = 4444L, browser = 'firefox'))
remDr <- remoteDriver()
remDr$open()
remDr$navigate("https://www.linkedin.com/login")
user <- remDr$findElement(using = 'id',"username")
user$sendKeysToElement(list(userID,key="tab"))
pass <- remDr$findElement(using = 'id',"password")
pass$sendKeysToElement(list(passID,key="enter"))
Sys.sleep(5) # give the page time to fully load
# Navgate to individual profiles
# remDr$navigate("https://www.linkedin.com/in/thejlo/") # Jennifer Lopez
# remDr$navigate("https://www.linkedin.com/in/cruzted/") # Ted Cruz
remDr$navigate("https://www.linkedin.com/in/hadleywickham/") # Hadley Wickham
Sys.sleep(5) # give the page time to fully load
html <- remDr$getPageSource()[[1]]
signals <- read_html(html)
personFullNameLocationXPath <- '/html/body/div[9]/div[3]/div/div/div/div/div[2]/main/div[1]/section/div[2]/div[2]/div[1]/ul[1]/li[1]'
personName <- signals %>%
html_nodes(xpath = personFullNameLocationXPath) %>%
html_text()
personTagLineXPath = '/html/body/div[9]/div[3]/div/div/div/div/div[2]/main/div[1]/section/div[2]/div[2]/div[1]/h2'
personTagLine <- signals %>%
html_nodes(xpath = personTagLineXPath) %>%
html_text()
personLocationXPath <- '//*[@id="ember49"]/div[2]/div[2]/div[1]/ul[2]/li[1]'
personLocation <- signals %>%
html_nodes(xpath = personLocationXPath) %>%
html_text()
personLocation %>%
gsub("[\r\n]", "", .) %>%
str_trim(.)
# Here is where I have problems
personExperienceTotalXPath = '//*[@id="experience-section"]/ul'
personExperienceTotal <- signals %>%
html_nodes(xpath = personExperienceTotalXPath) %>%
html_text()
最后一个错误personExperienceTotal
是我出问题的地方...我似乎无法刮除experience-section
。当我放置自己的LinkedIn URL(或一些随机的人)时,它似乎可以工作...
我的问题是,如何单击expand experience/education
并刮擦这些部分?