我有html(本地)文件,如下所示:
会有人如此善良,并告诉我如何处理这种情况,在这种布局下刮几行吗?
这是许多不成功的试验之一:
library(XML)
example.html <- scan(file=file.choose(),what="character")
parse.html <- htmlTreeParse(example.html, useInternalNodes = TRUE)
xpath.val <- xpathApply(parse.html, '//div', xmlValue)
g.val <- gsub('\\s', '', xpath.val)
如果有人有兴趣看到html文件本身是here
编辑:我当然不希望任何人解决这个问题。我会很高兴看到在哪里看。答案 0 :(得分:1)
好的,这并不能让你完全相同,但也许这会有所帮助
library(XML)
library(stringr)
namespaces=c(xmlns="http://www.xbrl.org/2008/inlineXBRL")
parse.html <- htmlTreeParse("~/Downloads/html1.html", useInternalNodes=TRUE)
tt <- xpathApply(parse.html, '//tr[@class="iris_table_row"]', namespaces=namespaces)
foo <- function(x){
vals <- sapply(xmlChildren(x), xmlValue)
str_trim(vals[names(vals) %in% "td" & sapply(vals, nchar)>0], "both")
}
rows <- lapply(tt, foo)
rows[170:175]
[[1]]
td
"%"
[[2]]
td td
"Class of shares:" "holding"
[[3]]
td td
"Ordinary" "100.00"
[[4]]
td td
"Page 5" "continued..."
[[5]]
td
"Whitton Park Estates Limited (Registered number: 00231549)"
[[6]]
td
"Notes to the Abbreviated Accounts - continued"