我从这些链接中读取数据。
> library(XML)
> url <- "http://biostat.jhsph.edu/~jleek/contact.html"
> html <- htmlTreeParse(url, useInternalNodes=T)
然后我想从中提取第十行以计算其字符数。我该怎么办?
答案 0 :(得分:0)
> url <- "http://biostat.jhsph.edu/~jleek/contact.html"
> html <- htmlTreeParse(url, useInternalNodes=T)
> xpathSApply(html, "//div[@id = 'main']", xmlValue, trim = TRUE)
[1] "Contact Information\n\n\t\t\t Address \n\t\t\t \n\t\t\t Johns Hopkins University \n\t\t\t Bloomberg School of Public Health \n\t\t\t 615 North Wolfe Street \n\t\t\t Baltimore, MD 21205-2179 \n\t\t\t Phone\n\t\t\t 410-955-1166 (I am much easier to reach by email)\n\t\t\t Fax\n\t\t\t 410-955-0958\n\t\t\t Email\n\t\t\t jleek || jhsph dot edu \n\t\t\t Twitter\n\t\t\t @leekgroup\n\t\t\t Blog\n\t\t\t Simply Statistics"
然后用nchar()
包裹上面的内容并将其分配给一个对象,这里是字符。
> characters <- nchar(xpathSApply(html, "//div[@id = 'main']", xmlValue, trim = TRUE))
> characters
[1] 369
您可以使用gsub()
删除标签和新行标记。