Question

我正在编写一个代码，其中列名称（例如"Category"）由用户提供并分配给变量biz.area。例如......

biz.area <- "Category"

原始数据框保存为risk.data。用户还通过提供变量first.column和last.column的列名来提供要分析的列范围。

这些专栏中的文字将分为双字母组合，以便进行进一步的文本分析，包括tf_idf。

我的分析代码如下。

x.bigrams <- risk.data %>% 
  gather(fields, alldata, first.column:last.column) %>% 
  unnest_tokens(bigrams,alldata,token = "ngrams", n=2) %>% 
  count(bigrams, biz.area, sort=TRUE) %>%
  bind_tf_idf(bigrams, biz.area, n) %>%
  arrange(desc(tf_idf))

但是，我收到以下错误。

grouped_df_impl（data，unname（vars），drop）中的错误：列 x.biz.area未知

这是因为count()需要列名文本字符串而不是变量biz.area。如果我改为使用count_()，则会收到以下错误。

compat_lazy_dots（vars，caller_env（））出错：对象'bigrams' 找不到

这是因为count_()期望只查找变量而bigrams不是变量。

如何将常量和变量都传递给count()或count_()？

感谢您的建议！

Answer 1

在我看来，您需要封装，以便可以将列名称作为变量传递，而不是作为字符串或值。由于您已经在使用dplyr，因此可以使用dplyr's non-standard evaluation techniques。

尝试以下几点：

library(tidyverse)

analyze_risk  <- function(area, firstcol, lastcol) {

    # turn your arguments into enclosures
    areaq  <- enquo(area)
    firstcolq <- enquo(firstcol)
    lastcolq <- enquo(lastcol)

    # run your analysis on the risk data
    risk.data %>% 
      gather(fields, alldata, !!firstcolq:!!lastcolq) %>% 
      unnest_tokens(bigrams,alldata,token = "ngrams", n=2) %>% 
      count(bigrams, !!areaq, sort=TRUE) %>%
      bind_tf_idf(bigrams, !!areaq, n) %>%
      arrange(desc(tf_idf))
}

在这种情况下，您的用户会将裸列名称传递给函数，如下所示：

myresults  <- analyze_risk(Category, Name_of_Firstcol, Name_of_Lastcol)

如果您希望用户传入字符串，则需要使用rlang::expr()代替enquo()。

在R

1 个答案: