我正在尝试创建一个创建 Wordcloud 的函数,该函数包含参数,数据框和其中一列。但是,第一个声明中有一个错误。我想拥有' DataFrame $ Column '传递作为VectorSource的参数。我怎样才能最好地实现这个目标?
createsWordcloud <- function(df, col) {
# An Object of Class VectorSource which extends the Class Source representing a vector where each entry is interpreted as a document.
# Every Element of the Corpus is stored as a Document...
# The Bug is right here!..
corpus <- Corpus(VectorSource(paste(df, "$", col, sep="")))
# Convert the Corpus to Plain Text Document
corpus <- tm_map(corpus, PlainTextDocument)
# Remove Punctuation & STOPWORDS...
# STOPWORDs are commonly used words in the English Language... i.e. I, me, my
# To view the full list of STOPWORDS, type stopwords('english') in the Console...
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeWords, stopwords('english'))
# Next we perform STEMMING... All the words will be converted to their stem
# i.e. learning -> learn, walked -> walk
# These Words will be Plotted Only Once!
corpus <- tm_map(corpus, stemDocument)
wordcloud(corpus, max.words=100, random.order=FALSE)
# These parameters are used to limit the number of words plotted.
# max.words will plot the specified number of words and discard least frequent terms,
# whereas, min.freq will discard all terms whose frequency is below the specified value.
}
答案 0 :(得分:0)
有两种方法可以实现这一目标;一个人使用非标准评估。此外,对于您的特定任务,简单地传递df$col
并使函数采用向量而不是数据框可能是可行的,因为您只在给定的代码中使用该列。
如果确实需要传入列名,标准方法是将列名作为字符串传入,并使用子集([.data.frame
)运算符引用它:
readcol <- function(df, col) {
df[, col]
}
然后
> readcol(data.frame(x=1:10), "x")
[1] 1 2 3 4 5 6 7 8 9 10
如果您真的不想引用列的名称,则需要对col
参数进行延迟评估以从数据框中提取它:
readcol.nse <- function(df, col) {
eval(substitute(col), df, parent.frame())
}
然后
> readcol.nse(data.frame(x=1:10), x)
[1] 1 2 3 4 5 6 7 8 9 10
标准警告适用于此 - 非常谨慎的非标准评估。很难以编程方式使用(因为将列名传递给另一个函数是棘手的)并且可能具有更复杂的表达式的非直观副作用。字符串形式有点笨拙,但更容易组合。