Question

通过执行as.factor(response)将响应变量设置为一个因子，然后运行：

tree = ctree(response~., data=trainingset)

当我绘制这个树时：它给出了图中y的矢量值作为示例： y =（0.095,0.905,0）我注意到3个值总和为1.

但事实上，实际的响应变量只包含0,1.99的值。

有人可以帮我解释ctree情节中的这个向量吗？谢谢！

就具体代码而言，它如下所示：

response = as.factor(data$response) 
newdata = cbind(predictor.matrix, response)

ind = sample(2, nrow(newdata), replace=TRUE, prob=c(0.7, 0.3))
trainData = newdata[ind==1,]
testData = newdata[ind==2,]

tree = ctree(response~., data=trainData)
plot(tree, type="simple")

Answer 1

这些是你每个班级的后验概率;即，对于班级1，该节点的后验概率为〜0.9（90％）（假设您的因子水平为c(0, 1, 99)。

在实际意义上，这意味着该节点中~90％的观测值属于1类，〜5％属于类0，且没有一个观测值属于{{1}类}。

我认为扔你的是你的课程是数字水平，而且情节有后验概率，也是数字。如果我们查看派对包中的示例，其中响应是字符级别的因素，希望您能更好地理解树中的绘图和输出。

来自99

?ctree

此处，library("party") irisct <- ctree(Species ~ ., data = iris) irisct R> irisct Conditional inference tree with 4 terminal nodes Response: Species Inputs: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width Number of observations: 150 1) Petal.Length <= 1.9; criterion = 1, statistic = 140.264 2)* weights = 50 1) Petal.Length > 1.9 3) Petal.Width <= 1.7; criterion = 1, statistic = 67.894 4) Petal.Length <= 4.8; criterion = 0.999, statistic = 13.865 5)* weights = 46 4) Petal.Length > 4.8 6)* weights = 8 3) Petal.Width > 1.7 7)* weights = 46是具有级别

的因子变量

Species

绘制树显示终端节点中的数值后验概率：

R> with(iris, levels(Species))
[1] "setosa"     "versicolor" "virginica"

enter image description here

更具信息性的情节是：

plot(irisct, type = "simple")

enter image description here

因为这清楚地表明每个节点具有来自一个或多个类的许多观察结果。哪个是后验概率的计算方法。

树的预测由plot(irisct)方法

给出

predict()

您可以通过predict(irisct) R> predict(irisct) [1] setosa setosa setosa setosa setosa setosa [7] setosa setosa setosa setosa setosa setosa [13] setosa setosa setosa setosa setosa setosa ....函数

来获取每次观察的后验概率

treeresponse

结果是来自ctree分类而不是标量的向量值

1 个答案: