Question

我正在尝试在R中编写一个函数来汇总一个表。以下是一个示例函数，我使用Iris数据作为测试。

 public event PropertyChangedEventHandler PropertyChanged;

    string test; 

    protected void OnPropertyChanged (string propertyName)
    {
        if (PropertyChanged != null) {    //This line is grey
            PropertyChanged(this, new PropertyChangedEventArgs(propertyName));
        }  //This line is grey
        if (test != null) {
            PropertyChanged(this, new PropertyChangedEventArgs(propertyName));
        }
    }

正如您所看到的，以下部分中的输出存在问题，其中表中的第一个变量被称为＆＃34;数据[[by_var_nm]]＆＃34;而不是＆＃34;物种＆＃34;。在汇总过程中，有什么方法可以保留原始变量名吗？

test_func <- function(data, by_var_nm) {
  by_var_nm <- deparse(substitute(by_var_nm))

  tbl_test_sum <- data %>% 
    group_by(data[[by_var_nm]]) %>% 
    summarise(
      count = n()
    )
  tbl_test_sum
}

test_func(iris, Species)

谢谢。

谢谢大家非常有帮助的答案。我尝试了解决方案，看来snoram的答案很好地解决了我的初始问题。然而，在我把所有东西组合在一起之后，我无法让最后一点的情节正常工作。我的想法是，我想在＆＃34; var_nm＆＃34;上绘制百分比分布图。并按＆＃34; by_var_nm＆＃34;对它们进行分组。我得到的问题是条形图以及数据标签的百分比没有正确排列。

# A tibble: 3 x 2
  `data[[by_var_nm]]` count
  <fct>               <int>
1 setosa                 50
2 versicolor             50
3 virginica              50

Answer 1

建议与Alexandre类似的解决方案，但同时打破dplyr依赖。如果你打算保留这个功能，我认为不必要的依赖不是一个好主意。

test_func <- function(data, by_var_nm) {
  by_var_nm <- deparse(substitute(by_var_nm))
  tbl_test_sum <- as.data.frame(table(data[[by_var_nm]]))
  names(tbl_test_sum) <- c(by_var_nm, "count")
  tbl_test_sum
}

速度：

> microbenchmark::microbenchmark(test_func_Alex(iris, Species), test_func_snoram(iris, Species), unit = "relative")
Unit: relative
                            expr      min       lq     mean   median       uq      max neval cld
   test_func_Alex(iris, Species) 6.910679 6.834064 5.827796 5.622154 5.480321 4.009469   100   b
 test_func_snoram(iris, Species) 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000   100  a

Answer 2

您可以使用专为此用例设计的rlang语录语法;另请阅读the examples here：

library(rlang); library(dplyr)

test_func <- function(data, by_var_nm) {
    by_var_nm <- enquo(by_var_nm)

    tbl_test_sum <- data %>% 
        group_by(!!by_var_nm) %>% 
        summarise(
            count = n()
        )
    tbl_test_sum
}

test_func(iris, Species)

# A tibble: 3 x 2
#  Species    count
#  <fct>      <int>
#1 setosa        50
#2 versicolor    50
#3 virginica     50

Answer 3

我不知道为什么会这样，但你可以用这个技巧取回这个名字：

test_func <- function(data, by_var_nm) {
  by_var_nm <- deparse(substitute(by_var_nm))

  tbl_test_sum <- data %>% 
    group_by(data[[by_var_nm]]) %>% 
    summarise(
      count = n()
    )
  names(tbl_test_sum)[grep("by_var_nm",names(tbl_test_sum))] <- by_var_nm
  tbl_test_sum
}

test_func(iris, Species)

您也可以使用索引names(tbl_test_sum)[1]，假设group_by()正在为此变量创建第一列。

希望这会对你有所帮助

在R函数中使用summarize（）进行分组汇总时保留原始变量名称

3 个答案: