包装党中的森林返回Inf进行所有预测

时间:2018-11-30 01:31:40

标签: r random-forest party

我正在尝试使用R打包方中的cforest函数来分析一些经过右删失的生存数据。每次使用预测函数时,每个值都会得到Inf,这意味着无法生成一致性索引。

我的数据可以在这里下载:https://www.dropbox.com/s/nt9s3p1rdafq465/test_data.csv?dl=0

示例:

library(party)
library(survival)

mydata <- read.csv(file="test_data.csv", header=TRUE, sep=",",row.names=NULL)    
train<-head(mydata, n=800)
test<-tail(mydata, n=37)

cif_result <- cforest(Surv(timeToEvent, status) ~ V1 + V2 + V3 + V4 + V5 + V6, 
                    data = train,
                    control=cforest_classical())

cforest_pred <- predict(object = cif_result, newdata = test) 
cforest_pred

837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 
Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf 
857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 
Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf 

我做错什么了吗?为什么cforest只根据这些数据预测Inf?

1 个答案:

答案 0 :(得分:1)

party软件包中用于生存树木/森林的Inf方法返回中位生存时间。由于只有不到20%的观测事件发生,因此无法计算有限的中位生存时间。因此它是m <- survfit(Surv(timeToEvent, status) ~ 1, data = train) plot(m) 。例如,考虑全样本拟合:

gems_list = ["Emerald", "Ivory", "Jasper", "Ruby", "Garnet"]
price_list = [1760, 2119, 1599, 3920, 3999]
reqd_gems = ["Ivory", "Emerald", "Garnet"]
reqd_quantity = [3, 2, 5]

quantity_dict = dict(zip(reqd_gems, reqd_quantity))
price_dict = dict(zip(gems_list, price_list))
print("Item", "Quantity", "Unit_price", "Total_price")
for k, v in quantity_dict.items():
    print(k, v, price_dict[k], price_dict[k] * v)
print("Grand_total", sum([price_dict[k] * v for k, v in quantity_dict.items()]))

survfit