我个人在plyr
之前学过dplyr
,并且我试图尽可能将代码规范化为dplyr
语法,但是我陷入了以下用例:
ddply(
.data = somedataframe,
.variables = c('var1', 'var2'),
.function =
function(thisdf){
...
}
)
函数调用中的...
是数据帧的任意复杂的修改。请注意,ddply
与dlply
(或任何其他dxply
)的选择纯粹是为了说明。 dplyr
中是否存在一个函数(目前称为dplyr::f
),还可能需要一个任意的修改函数?例如:
somedataframe %>%
group_by(var1, var2) %>%
dplyr::f(.function = function(thisdf){ ... })
在研究此功能时,我能找到的所有示例都是summarise
的极其简单的ddply
实现。
答案 0 :(得分:0)
可能最简单的方法是使用dplyr::do()
函数,但也可以使用group_map()
。完整示例:
library(tidyverse)
#some complex function
func = function(x) {
mod = lm(Sepal.Length ~ Petal.Width, data = x)
mod_coefs = broom::tidy(mod)
tibble(
mean_sepal_length = mean(x$Sepal.Length),
mean_petal_width = mean(x$Petal.Width),
slope = mod_coefs[[2, 2]],
slope_p = mod_coefs[[2, 5]]
)
}
#plyr version
plyr::ddply(iris, "Species", func)
#dplyr with do()
iris %>%
group_by(Species) %>%
do(func(.))
#dplyr with group_map()
#have to rewrite the function to take a second argument, which is the grouping variable
func2 = function(x, y) {
mod = lm(Sepal.Length ~ Petal.Width, data = x)
mod_coefs = broom::tidy(mod)
tibble(
mean_sepal_length = mean(x$Sepal.Length),
mean_petal_width = mean(x$Petal.Width),
slope = mod_coefs[[2, 2]],
slope_p = mod_coefs[[2, 5]]
)
}
iris %>%
group_by(Species) %>%
group_map(func2)
这些产品:
Species mean_sepal_length mean_petal_width slope slope_p
1 setosa 5.006 0.246 0.9301727 5.052644e-02
2 versicolor 5.936 1.326 1.4263647 4.035422e-05
3 virginica 6.588 2.026 0.6508306 4.798149e-02
# A tibble: 3 x 5
# Groups: Species [3]
Species mean_sepal_length mean_petal_width slope slope_p
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.01 0.246 0.930 0.0505
2 versicolor 5.94 1.33 1.43 0.0000404
3 virginica 6.59 2.03 0.651 0.0480
# A tibble: 3 x 5
# Groups: Species [3]
Species mean_sepal_length mean_petal_width slope slope_p
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 5.01 0.246 0.930 0.0505
2 versicolor 5.94 1.33 1.43 0.0000404
3 virginica 6.59 2.03 0.651 0.0480
有2个区别。 ddply()
输出是标准数据帧,即使函数输出了小标题。尽管分组已“使用”,但 dplyr 输出仍是分组的小标题。