Question

您好我尝试使用包cv.tree中的tree函数。我有一个二进制分类响应（称为Label）和30个预测变量。我使用所有预测变量来拟合树对象。

我收到以下错误消息：我不明白：

as.data.frame.default出错（data，optional = TRUE）：不能强迫阶级＆＃34;＆＃34;功能＆＃34;＆＃34;到data.frame

数据是文件＆＃39;培训＆＃39;取自this网站。

这就是我所做的：

x <- read.csv("training.csv")
attach(x)
library(tree)
Tree <- tree(Label~., x, subset=sample(1:nrow(x), nrow(x)/2))
CV <- cv.tree(Tree,FUN=prune.misclass)

Answer 1

我认为问题出在因变量列表中。以下工作，但我认为您需要更仔细地阅读问题描述。首先，设置无重量的公式。

x <- read.csv("training.csv")
vars<-setdiff(names(x),c("EventId","Label","Weight"))
fmla <- paste("Label", "~", vars[1], "+", 
           paste(vars[-c(1)], collapse=" + "))

这是你一直在运行的

Tree <- tree(fmla, x, subset=sample(1:nrow(x), nrow(x)/2))
plot(Tree)
$size
[1] 6 5 4 3 1

$dev
[1] 25859 25859 27510 30075 42725

$k
[1]   -Inf    0.0 1929.0 2791.0 6188.5

$method
[1] "misclass"

attr(,"class")
[1] "prune"         "tree.sequence"

您可能还需要考虑包rpart

urows = sample(1:nrow(x), nrow(x)/2)
x_sub <- x[urows,]
Tree <- tree(fmla, x_sub)
plot(Tree)
CV <- cv.tree(Tree,FUN=prune.misclass)
CV

library(rpart)
tr <- rpart(fmla, data=x_sub, method="class")
printcp(tr)

Classification tree:
rpart(formula = fmla, data = x_sub, method = "class")

Variables actually used in tree construction:
[1] DER_mass_MMC                DER_mass_transverse_met_lep
[3] DER_mass_vis               

Root node error: 42616/125000 = 0.34093

n= 125000 

        CP nsplit rel error  xerror      xstd
1 0.153733      0   1.00000 1.00000 0.0039326
2 0.059274      2   0.69253 0.69479 0.0035273
3 0.020016      3   0.63326 0.63582 0.0034184
4 0.010000      5   0.59323 0.59651 0.0033393

如果你包括体重，那么这是唯一的分裂。

vars<-setdiff(names(x),c("EventId","Label"))

Answer 2

cv.tree 调用 model.frame 时发生错误。＆＃39;电话＆＃39;树对象的元素必须包含对数据框的引用，该数据框的名称也不是已加载函数的名称。

因此，当 cv.tree 以后使用＆＃39; call＆＃39;时，不仅会在 tree 的调用中进行子集化生成错误。树对象的元素，使用名称为＆＃34; df＆＃34;的数据帧;也会产生错误，因为model.frame会将此作为现有函数的名称（即来自统计软件包的F分布的密度）。

使用cv.tree时出错

2 个答案: