Question

我有计算各种产品模型的函数和另一种格式化结果的函数。我分两步使用它们：

按产品（可能是另一个变量）对数据进行分组，并使用do（）创建包含每个产品的模型输出的数据框。
再次使用do（）创建格式化结果的数据框。

问题是我在步骤2中使用的格式化功能将看到不同的输入格式，具体取决于数据框是否按行（）或group_by（）分组。这是一个可重现的示例，您可以在其中查看str（）的输出：

第1步：

mockmodel <- function(dtf){
    list(a = mean(dtf$Petal.Length),
         b = as.character(unique(dtf$Species)))
}
iris2 <- iris %>% 
    group_by(Species) %>% 
    do(x = mockmodel(.))

第2步。

formatoutput <- function(dtf){
    data.frame(aisinthere = !is.null(dtf$x[[1]]$a),
               meanpetallength = dtf$x[[1]]$a)
}

on rowwise（）数据框

给出错误

iris2 %>% do(formatoutput(.)) 

Error in dtf$x[[1]]$a : $ operator is invalid for atomic vectors

on group_by（）数据框

工作但发出警告。

iris2 %>% group_by(Species) %>% do(formatoutput(.)) 

Source: local data frame [3 x 3]
Groups: Species

     Species aisinthere meanpetallength
1     setosa       TRUE           1.462
2 versicolor       TRUE           4.260
3  virginica       TRUE           5.552
Warning message:
Grouping rowwise data frame strips rowwise nature

数据结构

strx <- function(dtf){
    str(dtf$x) # display structure
    data.frame() # make do happy
}
iris2 %>% do(strx(.))

List of 2
 $ a: num 1.46
 $ b: chr "setosa"
List of 2
 $ a: num 4.26
 $ b: chr "versicolor"
List of 2
 $ a: num 5.55
 $ b: chr "virginica"
Source: local data frame [0 x 0]
Groups: <by row>

iris2 %>% group_by(Species) %>% do(strx(.))

List of 1
 $ :List of 2
  ..$ a: num 1.46
  ..$ b: chr "setosa"
List of 1
 $ :List of 2
  ..$ a: num 4.26
  ..$ b: chr "versicolor"
List of 1
 $ :List of 2
  ..$ a: num 5.55
  ..$ b: chr "virginica"
Source: local data frame [0 x 1]
Groups: Species

Variables not shown: Species (fctr)
Warning message:
Grouping rowwise data frame strips rowwise nature

str（）的输出告诉我们，当do（）与group_by()一起使用时，dtf $ x的结构与在rowwise()之后使用do时的结构不同。如何避免这种情况并对使用rowwise()分组的数据框和使用group_by(Species, another_variable)分组的数据框使用相同的功能？我目前的解决方法包括在第二次do（）操作之前再次使用group_by()。事实上，为了避免关于＆＃34; strip rowwise nature＆＃34;的警告，我甚至在第二次do（）操作之前使用ungroup() %>% group_by(item)。还有更好的方法吗？

dplyr :: do（）文档的这一部分可能是相关的。它指定第一个do（）的输出默认为rowwise（）分组：

＆＃34;为单个未命名的输入保留组。这是不同的总结因为一般不会降低复杂性数据，它只是以一种特殊的方式表达它。对于多个命名输入，输出按行按行分组。这允许其他动词以直观的方式工作。＆＃34;

在do（）的输出上使用do（）时，如何为rowwise（）数据帧和group_by（）数据帧保持相同的列表结构？

第1步：

第2步。

on rowwise（）数据框

on group_by（）数据框

数据结构

0 个答案: