使用inspect元素的RSelenium和findElements

时间:2014-09-10 09:33:01

标签: r web-scraping

我想帮助尝试将以下网站中的每一节圣经章节作为数据帧中的一行字符串。

我正在努力寻找正确的元素/不知道如何将findElements()与浏览器中的inspect元素结合使用。任何关于如何对其他位进行此操作的指示,例如:交叉引用/脚注会很棒...(注意通过点击页面顶部附近的齿轮来调整'页面选项'可以看到交叉引用

以下是我尝试的代码。

chapter.url <- "https://www.biblegateway.com/passage/?search=Genesis+50&version=ESV"
library(RSelenium)
RSelenium:::startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(chapter.url)
webElem <- remDr$findElements('id','passage-text')

1 个答案:

答案 0 :(得分:8)

通常我会定位相关的HTML。用firefox firebug或类似的东西检查页面,我们看到:

enter image description here

相关的HTML代码段为<div class="version-ESV result-text-style-normal text-html ">。 所以我们可以找到类version-ESV的元素:

chapter.url <- "https://www.biblegateway.com/passage/?search=Genesis+50&version=ESV"
library(RSelenium)
RSelenium:::startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(chapter.url)
webElem <- remDr$findElement('class', 'version-ESV')
webElem$highlightElement() # check visually we have the right element

highlightElement方法为我们提供了视觉确认,即我们拥有所需的HTML块。最后,我们可以使用getElementAttribute方法获取此HTML代码段:

appData <- webElem$getElementAttribute("outerHTML")[[1]]

然后可以使用XML包解析这个HTML的诗句。

更新:

span中包含id的各种经文以“en-ESV-”开头,我们可以使用'//span[contains(@id,"en-ESV-")]为XPATH定位。但是在这些代码块中,我们只希望子节点是文本节点。一旦我们找到这些文本节点,我们希望将它们粘贴在一起用空格分隔:

appXPATH <- '//span[contains(@id,"en-ESV-")]'
appFunc <- function(x){
  appChildren <- xmlChildren(x)
  out <- appChildren[names(appChildren) == "text"]
  paste(sapply(out, xmlValue), collapse = ' ')
}
doc <- htmlParse(appData, encoding = 'UTF8') # specify encoding
results <- xpathSApply(doc, appXPATH, appFunc)

具有以下结果:

> head(results)
[1] "Then Joseph  fell on his father's face and wept over him and kissed him."                                                                                                                                                   
[2] "And Joseph commanded his servants the physicians to  embalm his father. So the physicians embalmed Israel."                                                                                                                 
[3] "Forty days were required for it, for that is how many are required for embalming. And the Egyptians  wept for him seventy days."                                                                                            
[4] "And when the days of weeping for him were past, Joseph spoke to the household of Pharaoh, saying,  “If now I have found favor in your eyes, please speak in the ears of Pharaoh, saying,"                                   
[5] "‘My father made me swear, saying, “I am about to die: in my tomb  that I hewed out for myself in the land of Canaan, there shall you bury me.” Now therefore, let me please go up and bury my father. Then I will return.’”"
[6] "And Pharaoh answered, “Go up, and bury your father, as he made you swear.”"