例如这个HTML代码:
<p>hello world</p>
<p>the weather is fine today</p>
<p>it is fine in a lot of places in the world<p>
对于关键词“世界”,结果将是:
hello world
it is fine in a lot of places in the world
答案 0 :(得分:1)
哦,我们是一个代码编写服务。呵呵。也许我们可以使用XPath完成所有操作,而不是在R中旋转不必要的循环:
for(var key in result1){
var a = result1[key];
// do something with 'a'
}
如果你不能升级到Hadleyverse,那么类似的成语将在library(xml2)
library(rvest)
doc_txt <- "<p>hello world</p>
<p>the weather is fine today</p>
<p>it is fine in a lot of places in the world<p>"
doc <- read_html(doc_txt)
xml_text(xml_nodes(doc, xpath="//p[text()[contains(.,'world')]]"))
## [1] "hello world"
## [2] "it is fine in a lot of places in the world"
包中有效:
XML
答案 1 :(得分:0)
以下是两种选择:
1)XML 使用XML包:
Lines <- "<p>hello world</p>
<p>the weather is fine today</p>
<p>it is fine in a lot of places in the world<p>"
library(XML)
doc <- htmlTreeParse(Lines, asText = TRUE, useInternalNodes = TRUE)
grep("hello", xpathSApply(xmlRoot(doc), "//p", xmlValue), value = TRUE)
,并提供:
[1] "hello world"
2)正则表达式如果<p>
和</p>
始终出现在示例中的同一行,那么这也会有效:
L <- readLines(textConnection(Lines))
gsub(".*<p>|</p>.*", "", grep("<p>.*hello", L, value = TRUE))
,并提供:
[1] "hello world"