我有一个数据框,我想使用lapply。我在这里选择了第一列的第一个值:
link <- c(
"http://www.r-statistics.com/tag/hadley-wickham/",
"http://had.co.nz/",
"http://vita.had.co.nz/articles.html",
"http://blog.revolutionanalytics.com/2010/09/the-r-files-hadley-wickham.html",
"http://www.analyticstory.com/hadley-wickham/"
)
要应用的功能获取链接的内容并将其存储到语料库中[感谢agstudy]
create.corpus <- function(url.name){
doc=htmlParse(link)
parag=xpathSApply(doc,'//p',xmlValue)
cc=Corpus(VectorSource(parag))
meta(cc,type='corpus','link')=link
return(cc)
}
但我无法通过lapply获得该功能:
cc=lapply(link,create.corpus) # does not work
cc=lapply(link,nchar) # works
link=link[1] # try on single element
cc=create.corpus(link) # works
为什么这个功能在lapply中不起作用?
答案 0 :(得分:3)
您的功能存在问题。将link
的所有实例替换为url.name
,它将起作用。
# library(XML); library(tm)
create.corpus <- function(url.name){
doc=htmlParse(url.name)
parag=xpathSApply(doc,'//p',xmlValue)
cc=Corpus(VectorSource(parag))
meta(cc,type='corpus','link') <- url.name
return(cc)
}
cc <- lapply(link, create.corpus)
结果:
> cc
[[1]]
A corpus with 48 text documents
[[2]]
A corpus with 2 text documents
[[3]]
A corpus with 41 text documents
[[4]]
A corpus with 25 text documents
[[5]]
A corpus with 39 text documents