何我得到列表元素的html =="有些价值"?

时间:2016-06-02 19:56:15

标签: r rselenium

我有以下HTML代码:

<ul class="list" role="listbox" id="list1">

  <li class="lvl2">
    <div class="lvl3" id="lvl3-nb-1">
      choice1
    </div>
  </li>

  <li class="lvl2">
    <div class="lvl3" id="lvl3-nb-2">
      choice2
    </div>
  </li>

  <li class="lvl2">
    <div class="lvl3" id="lvl3-nb-3">
      choice3
    </div>
  </li>

</ul>

我想获得== "choice2"

元素的HTML(外部HTML,HTML +元素,选择器,Xpath,它并不重要)

如何使用RSelenium

执行此操作

由于

编辑澄清:列表元素的id是动态的(基本上是随机的),因此我需要的解决方案不能指代他们的HTML或CSS。但是,我确实知道choice1choice2choice3的价值(基本上所有其他内容,我知道这些类将被称为listlvl2例如,lvl3

尝试一个可重复的例子:

HTML:

<ul class="list" id="list1">
  <li class="lvl2">
    <div class="lvl3" id="n123">
      paul
    </div>
  </li>
  <li class="lvl2">
    <div class="lvl3" id="n471">
      john
    </div>
  </li>
  <li class="lvl2">
    <div class="lvl3" id="n951">
      ringo
    </div>
  </li>
</ul>

R:

> library(RSelenium)
> startServer()
> mybrowser <- remoteDriver()
> mybrowser$open()
> mybrowser$navigate("http://example.com")
> list_of_beatles <- mybrowser$findElement(using = 'css selector', "ul#list.list1")

> print(unlist(strsplit(as.character(list_of_beatles$getElementText()), "\n")))
[1] "paul"                              "john"              
[3] "ringo"

> # Let's say I want john's CSS selector, I'd want somethign kind of like that :
> css_selector_of_this_thing(which(unlist(strsplit(as.character(list_reponse$getElementText()), "\n")) == "john"))
> # Which would output, for instance "div#lvl3.n471" 

1 个答案:

答案 0 :(得分:1)

如果你知道这些课程会被称为listlvl2lvl3,那么你的文字会出现在课程lvl3的标签中,那么你可以使用{ {1}}:

xpath

或更简单:

result <- mybrowser$findElement(using = 'xpath',
    ""//ul[@class = 'list']/*[@class = 'lvl2']/*[@class = 'lvl3'][contains(., 'john')]"")

result$getElementAttribute("outerHTML")[[1]]
# [1] "<div class=\"lvl3\" id=\"n471\">\n      john\n    </div>">

result$getElementTagName()[[1]] # or result$getElementAttribute("tag")[[1]]
# [1] "div"

result$getElementAttribute("class")[[1]]
# [1] "lvl3"

result$getElementAttribute("id")[[1]]
# [1] "n471"

编辑:

根据OP的评论,有时需要区分result2 <- mybrowser$findElement(using = 'xpath', "//*[@class = 'lvl3'][contains(., 'john')]") john以及saint john。可能有基于xpath的方法,但我还没有想出来(建议/编辑欢迎)。所以,我将在初始xpath之后使用一些正则表达式:

johnny

假设我们将# use findElements (plural) to get multiple elements result <- mybrowser$findElements(using = 'xpath', "//*[@class = 'lvl3'][string()]") # loop through results and gather outerHTML to examine with regex choices <- unlist(lapply(result, function(x) x$getElementAttribute("outerHTML"))) 添加为另一个条目,然后johnny将如下所示:

choices

然后我们可以使用正则表达式找到正确的:

#[1] "<div class=\"lvl3\" id=\"n123\">\n      paul\n    </div>"  
#[2] "<div class=\"lvl3\" id=\"n471\">\n      john\n    </div>"  
#[3] "<div class=\"lvl3\" id=\"n951\">\n      ringo\n    </div>" 
#[4] "<div class=\"lvl3\" id=\"n952\">\n      johnny\n    </div>"

上面显示的方法可以在这里获取标签名称,类和ID。