plyr :: ddply在dplyr中等效

时间:2019-02-19 20:16:55

标签: r dplyr

我个人在plyr之前学过dplyr,并且我试图尽可能将代码规范化为dplyr语法,但是我陷入了以下用例:

ddply(
    .data = somedataframe, 
    .variables = c('var1', 'var2'),
    .function = 
        function(thisdf){
            ...
        }
)

函数调用中的...是数据帧的任意复杂的修改。请注意,ddplydlply(或任何其他dxply)的选择纯粹是为了说明。 dplyr中是否存在一个函数(目前称为dplyr::f),还可能需要一个任意的修改函数?例如:

somedataframe %>% 
    group_by(var1, var2) %>% 
    dplyr::f(.function = function(thisdf){ ... })

在研究此功能时,我能找到的所有示例都是summarise的极其简单的ddply实现。

1 个答案:

答案 0 :(得分:0)

可能最简单的方法是使用dplyr::do()函数,但也可以使用group_map()。完整示例:

library(tidyverse)

#some complex function
func = function(x) {
  mod = lm(Sepal.Length ~ Petal.Width, data = x)
  mod_coefs = broom::tidy(mod)

  tibble(
    mean_sepal_length = mean(x$Sepal.Length),
    mean_petal_width = mean(x$Petal.Width), 
    slope = mod_coefs[[2, 2]],
    slope_p = mod_coefs[[2, 5]]
  )
}

#plyr version
plyr::ddply(iris, "Species", func)

#dplyr with do()
iris %>% 
  group_by(Species) %>% 
  do(func(.))

#dplyr with group_map()
#have to rewrite the function to take a second argument, which is the grouping variable
func2 = function(x, y) {
  mod = lm(Sepal.Length ~ Petal.Width, data = x)
  mod_coefs = broom::tidy(mod)

  tibble(
    mean_sepal_length = mean(x$Sepal.Length),
    mean_petal_width = mean(x$Petal.Width), 
    slope = mod_coefs[[2, 2]],
    slope_p = mod_coefs[[2, 5]]
  )
}

iris %>% 
  group_by(Species) %>% 
  group_map(func2)

这些产品:

     Species mean_sepal_length mean_petal_width     slope      slope_p
1     setosa             5.006            0.246 0.9301727 5.052644e-02
2 versicolor             5.936            1.326 1.4263647 4.035422e-05
3  virginica             6.588            2.026 0.6508306 4.798149e-02

# A tibble: 3 x 5
# Groups:   Species [3]
  Species    mean_sepal_length mean_petal_width slope   slope_p
  <fct>                  <dbl>            <dbl> <dbl>     <dbl>
1 setosa                  5.01            0.246 0.930 0.0505   
2 versicolor              5.94            1.33  1.43  0.0000404
3 virginica               6.59            2.03  0.651 0.0480   

# A tibble: 3 x 5
# Groups:   Species [3]
  Species    mean_sepal_length mean_petal_width slope   slope_p
  <fct>                  <dbl>            <dbl> <dbl>     <dbl>
1 setosa                  5.01            0.246 0.930 0.0505   
2 versicolor              5.94            1.33  1.43  0.0000404
3 virginica               6.59            2.03  0.651 0.0480   

有2个区别。 ddply()输出是标准数据帧,即使函数输出了小标题。尽管分组已“使用”,但 dplyr 输出仍是分组的小标题。