我有一个分类变量和几十个序数特征。我想找到最小的特征子集,当求和时,产生最准确的分类。我试图指定每个特征的组合,计算每个组合的总分,然后确定最佳截止点以最大化灵敏度和特异性。以下是我尝试过的内容:
library(gtools)
library(OptimalCutpoints)
set.seed(2)
# create fake data for 1 classification variable and just 5 features
df <- data.frame(class=sample(0:1, 50, replace=T),
v01=sample(0:3, 50, replace=T),
v02=sample(0:3, 50, replace=T),
v03=sample(0:3, 50, replace=T),
v04=sample(0:3, 50, replace=T),
v05=sample(0:3, 50, replace=T))
# combinations
vars <- list()
out <- list()
for (i in 2:(length(df)-1)) {
p <- combinations(n = length(df)-1, r = i, v = names(df[2:(length(df))]))
for (r in 1:nrow(p)) {
keep <- c("class", p[r,])
df_ <- df[, keep]
df_$T <- rowSums(df_[,2:length(keep)])
oc <- summary(optimal.cutpoints(X = "T",
status = "class",
tag.healthy = 0,
methods = "SpEqualSe",
data = df_,
pop.prev = NULL,
categorical.cov = NULL,
control = control.cutpoints(),
ci.fit = TRUE,
conf.level = 0.95,
trace = FALSE))
name <- paste(i, r, sep=".")
vars[[name]] <- append(vars, p[r,])
out[[name]] <- append(out, oc) # when I inspect out R stalls
}
}
我不认为我会以正确的方式解决这个问题。
答案 0 :(得分:0)
这可能(a)驱动反循环设施疯狂,(b)当变量数量增加并且组合数量通过屋顶时变得非常慢,但我认为它&#34;工作&#34;
navigator.geolocation.getCurrentPosition
基本思想是循环遍历变量组合的每个组合,从2到5个变量的集合。对于每个变量组合,我计算一个比例分数,然后确定function onDeviceReady()
navigator.geolocation.getCurrentPosition(onSuccessGeo, onErrorGeo, {timeout: 5000});
}
。我提取library(gtools)
library(OptimalCutpoints)
# create fake data
df <- data.frame(class=sample(0:1, 50, replace=T),
v01=sample(0:3, 50, replace=T),
v02=sample(0:3, 50, replace=T),
v03=sample(0:3, 50, replace=T),
v04=sample(0:3, 50, replace=T),
v05=sample(0:3, 50, replace=T))
对象的详细信息并存储在随每次传递而增长的数据框中。
optimal.cutpoints