使用具有多个输出的函数(非汇总fxn?)并汇总

时间:2016-05-26 17:37:20

标签: r function dplyr

library(OptimalCutpoints)

library(dplyr)

以下是测试数据:

set.seed(123)

df<-data.frame(label=rbinom(1000,size=1,prob=0.5),score=rnorm(1000),type=sample(c("A","B","C","D"),1000,replace=TRUE))

使用group_by分组'type'并希望使用库中的optimal.cutpoints函数进行汇总(OptimalCutpoints)

df%>%group_by(type)%>%summarize(Opt_cut=optimal.cutpoints(X = "score", status = "label", tag.healthy = 0, methods = "MaxSpSe",data=df[,1:2]))

我得到了这个:     Error: expecting a single value

我可以得到这样的解决方法,提取每个“类型”并分别运行optimal.cutpoints:

df_A<-df%>%filter(grepl("A",type))
opt.cut.df.A <- optimal.cutpoints(X = "score", status = "label", tag.healthy = 0, methods = "MaxSpSe", data = df_A)

从opt.cut.df.A我可以像这样提取最佳截止值:

opt.cut.df.A[1]$MaxSpSe$Global$optimal.cutoff$cutoff

但这绝对不是最好的方式。大量的“类型” 除非我错过了什么,看起来总结只能使用单个输出功能。

问题:如何使用summary.cutpoint或类似函数进行汇总?

3 个答案:

答案 0 :(得分:2)

使用purrr的另一种选择:

library(purrr)

df %>%
  split(.$type) %>%
  map(~ optimal.cutpoints(X = "score", status = "label", 
                          tag.healthy = 0, methods = "MaxSpSe", data = .)) %>%
  map(c("MaxSpSe", "Global", "optimal.cutoff", "cutoff"))

给出了:

#$A
#[1] -0.0768659
#
#$B
#[1] 0.1612264 0.1830480
#
#$C
#[1] -0.08671413
#
#$D
#[1] 0.1071904 0.1155321 0.1390979

如果您想在data.frame中显示结果,可以将map_df添加到链中:

df %>%
    split(.$type) %>%
    map(~optimal.cutpoints(X = "score", status = "label", 
                           tag.healthy = 0, methods = "MaxSpSe", data = .)) %>% 
    map(c("MaxSpSe", "Global", "optimal.cutoff", "cutoff")) %>% 
    map_df(~data.frame(cutoff = .), .id = "type")

给出了:

#  type      cutoff
#1    A -0.07686590
#2    B  0.16122635
#3    B  0.18304797
#4    C -0.08671413
#5    D  0.10719041
#6    D  0.11553210
#7    D  0.13909786

答案 1 :(得分:1)

您也可以使用splitapply方法生成模型列表,然后从列表中提取值。

listOfModels <- lapply(split(df, df$type), function(subDf) 
                       optimal.cutpoints(X = "score", status = "label", 
                                         tag.healthy = 0, methods = "MaxSpSe",data=subDf))

lapply(listOfModels, function(model) model[1]$MaxSpSe$Global$optimal.cutoff$cutoff)

$A
[1] -0.0768659

$B
[1] 0.1612264 0.1830480

$C
[1] -0.08671413

$D
[1] 0.1071904 0.1155321 0.1390979

答案 2 :(得分:1)

library(data.table)
setDT(df)[,opt(.SD), by=type]
   type          V1
1:    A -0.07686590
2:    D  0.10719041
3:    D  0.11553210
4:    D  0.13909786
5:    B  0.16122635
6:    B  0.18304797
7:    C -0.08671413

其中opt是剪辑的功能:

opt <- function(df) optimal.cutpoints(X = "score", status = "label", tag.healthy = 0, methods = "MaxSpSe", data=df)[1]$MaxSpSe$Global$optimal.cutoff$cutoff

dplyr无效的原因是因为有时一个群体有一个截止点,有时候会有多个削减点。 summarise只等待一个值,混合长度向量会产生问题。