Question

我知道在做group_by时我可以summarise并计算频率，总和，平均值，中位数，标准等。我想知道在总结时我是否可以计算概率分布。 E.g。

dat%>%group_by(A, B)%>%summarise(C_dist = density(C))

我试过在r中这样做。但我得到以下错误。

Error in summarise_impl(.data, dots) : 
  Evaluation error: need at least 2 points to select a bandwidth automatically.

我的列中没有任何缺失值。

Answer 1

我宁愿使用tapply()

tryCatch位确保当一个组只有一个成员时，返回NA而不是让整个事件停止。

set.seed(1)
n <- 20
dtf <- data.frame(d=runif(n), 
                 g1=sample(1:3, n, replace=TRUE), 
                 g2=sample(c("A", "B"), n, replace=TRUE))

agg <- with(dtf, 
         tapply(d, list(g1, g2), 
         FUN=function(x) {
             tryCatch(density(x), error=function(e) NA)
         }))

str(agg)
agg[["2", "A"]]
# Call:
#    density.default(x = x)

# Data: x (3 obs.); Bandwidth 'bw' = 0.1733

#        x                 y           
#  Min.   :-0.2543   Min.   :0.008613  
#  1st Qu.: 0.1663   1st Qu.:0.156751  
#  Median : 0.5869   Median :0.699978  
#  Mean   : 0.5869   Mean   :0.593340  
#  3rd Qu.: 1.0074   3rd Qu.:0.902087  
#  Max.   : 1.4280   Max.   :1.199607

是否可以在R中进行分组时计算分布

1 个答案: