library(OptimalCutpoints)
library(dplyr)
以下是测试数据:
set.seed(123)
df<-data.frame(label=rbinom(1000,size=1,prob=0.5),score=rnorm(1000),type=sample(c("A","B","C","D"),1000,replace=TRUE))
使用group_by分组'type'并希望使用库中的optimal.cutpoints函数进行汇总(OptimalCutpoints)
df%>%group_by(type)%>%summarize(Opt_cut=optimal.cutpoints(X = "score", status = "label", tag.healthy = 0, methods = "MaxSpSe",data=df[,1:2]))
我得到了这个:
Error: expecting a single value
我可以得到这样的解决方法,提取每个“类型”并分别运行optimal.cutpoints:
df_A<-df%>%filter(grepl("A",type))
opt.cut.df.A <- optimal.cutpoints(X = "score", status = "label", tag.healthy = 0, methods = "MaxSpSe", data = df_A)
从opt.cut.df.A我可以像这样提取最佳截止值:
opt.cut.df.A[1]$MaxSpSe$Global$optimal.cutoff$cutoff
但这绝对不是最好的方式。大量的“类型” 除非我错过了什么,看起来总结只能使用单个输出功能。
答案 0 :(得分:2)
使用purrr
的另一种选择:
library(purrr)
df %>%
split(.$type) %>%
map(~ optimal.cutpoints(X = "score", status = "label",
tag.healthy = 0, methods = "MaxSpSe", data = .)) %>%
map(c("MaxSpSe", "Global", "optimal.cutoff", "cutoff"))
给出了:
#$A
#[1] -0.0768659
#
#$B
#[1] 0.1612264 0.1830480
#
#$C
#[1] -0.08671413
#
#$D
#[1] 0.1071904 0.1155321 0.1390979
如果您想在data.frame中显示结果,可以将map_df
添加到链中:
df %>%
split(.$type) %>%
map(~optimal.cutpoints(X = "score", status = "label",
tag.healthy = 0, methods = "MaxSpSe", data = .)) %>%
map(c("MaxSpSe", "Global", "optimal.cutoff", "cutoff")) %>%
map_df(~data.frame(cutoff = .), .id = "type")
给出了:
# type cutoff
#1 A -0.07686590
#2 B 0.16122635
#3 B 0.18304797
#4 C -0.08671413
#5 D 0.10719041
#6 D 0.11553210
#7 D 0.13909786
答案 1 :(得分:1)
您也可以使用split
,apply
方法生成模型列表,然后从列表中提取值。
listOfModels <- lapply(split(df, df$type), function(subDf)
optimal.cutpoints(X = "score", status = "label",
tag.healthy = 0, methods = "MaxSpSe",data=subDf))
lapply(listOfModels, function(model) model[1]$MaxSpSe$Global$optimal.cutoff$cutoff)
$A
[1] -0.0768659
$B
[1] 0.1612264 0.1830480
$C
[1] -0.08671413
$D
[1] 0.1071904 0.1155321 0.1390979
答案 2 :(得分:1)
library(data.table)
setDT(df)[,opt(.SD), by=type]
type V1
1: A -0.07686590
2: D 0.10719041
3: D 0.11553210
4: D 0.13909786
5: B 0.16122635
6: B 0.18304797
7: C -0.08671413
其中opt
是剪辑的功能:
opt <- function(df) optimal.cutpoints(X = "score", status = "label", tag.healthy = 0, methods = "MaxSpSe", data=df)[1]$MaxSpSe$Global$optimal.cutoff$cutoff
dplyr
无效的原因是因为有时一个群体有一个截止点,有时候会有多个削减点。 summarise
只等待一个值,混合长度向量会产生问题。