我有以下HTML代码:
<ul class="list" role="listbox" id="list1">
<li class="lvl2">
<div class="lvl3" id="lvl3-nb-1">
choice1
</div>
</li>
<li class="lvl2">
<div class="lvl3" id="lvl3-nb-2">
choice2
</div>
</li>
<li class="lvl2">
<div class="lvl3" id="lvl3-nb-3">
choice3
</div>
</li>
</ul>
我想获得== "choice2"
如何使用RSelenium
?
由于
编辑澄清:列表元素的id
是动态的(基本上是随机的),因此我需要的解决方案不能指代他们的HTML或CSS。但是,我确实知道choice1
,choice2
和choice3
的价值(基本上所有其他内容,我知道这些类将被称为list
,lvl2
例如,lvl3
。
尝试一个可重复的例子:
HTML:
<ul class="list" id="list1">
<li class="lvl2">
<div class="lvl3" id="n123">
paul
</div>
</li>
<li class="lvl2">
<div class="lvl3" id="n471">
john
</div>
</li>
<li class="lvl2">
<div class="lvl3" id="n951">
ringo
</div>
</li>
</ul>
R:
> library(RSelenium)
> startServer()
> mybrowser <- remoteDriver()
> mybrowser$open()
> mybrowser$navigate("http://example.com")
> list_of_beatles <- mybrowser$findElement(using = 'css selector', "ul#list.list1")
> print(unlist(strsplit(as.character(list_of_beatles$getElementText()), "\n")))
[1] "paul" "john"
[3] "ringo"
> # Let's say I want john's CSS selector, I'd want somethign kind of like that :
> css_selector_of_this_thing(which(unlist(strsplit(as.character(list_reponse$getElementText()), "\n")) == "john"))
> # Which would output, for instance "div#lvl3.n471"
答案 0 :(得分:1)
如果你知道这些课程会被称为list
,lvl2
和lvl3
,那么你的文字会出现在课程lvl3
的标签中,那么你可以使用{ {1}}:
xpath
或更简单:
result <- mybrowser$findElement(using = 'xpath',
""//ul[@class = 'list']/*[@class = 'lvl2']/*[@class = 'lvl3'][contains(., 'john')]"")
result$getElementAttribute("outerHTML")[[1]]
# [1] "<div class=\"lvl3\" id=\"n471\">\n john\n </div>">
result$getElementTagName()[[1]] # or result$getElementAttribute("tag")[[1]]
# [1] "div"
result$getElementAttribute("class")[[1]]
# [1] "lvl3"
result$getElementAttribute("id")[[1]]
# [1] "n471"
根据OP的评论,有时需要区分result2 <- mybrowser$findElement(using = 'xpath',
"//*[@class = 'lvl3'][contains(., 'john')]")
和john
以及saint john
。可能有基于xpath的方法,但我还没有想出来(建议/编辑欢迎)。所以,我将在初始xpath之后使用一些正则表达式:
johnny
假设我们将# use findElements (plural) to get multiple elements
result <- mybrowser$findElements(using = 'xpath',
"//*[@class = 'lvl3'][string()]")
# loop through results and gather outerHTML to examine with regex
choices <- unlist(lapply(result, function(x) x$getElementAttribute("outerHTML")))
添加为另一个条目,然后johnny
将如下所示:
choices
然后我们可以使用正则表达式找到正确的:
#[1] "<div class=\"lvl3\" id=\"n123\">\n paul\n </div>"
#[2] "<div class=\"lvl3\" id=\"n471\">\n john\n </div>"
#[3] "<div class=\"lvl3\" id=\"n951\">\n ringo\n </div>"
#[4] "<div class=\"lvl3\" id=\"n952\">\n johnny\n </div>"
上面显示的方法可以在这里获取标签名称,类和ID。