使用Spark ML获得生存功能

时间:2017-05-19 15:10:05

标签: apache-spark pyspark survival-analysis

我正在使用PySpark训练加速失败时间模型(来自pyspark.ml.regression import AFTSurvivalRegression)

现在我想将模型应用于新数据并获得事件在时间t(生存函数)之前发生的概率,我应该使用哪种方法?我不清楚文档:https://spark.apache.org/docs/2.1.0/api/python/pyspark.ml.html#pyspark.ml.regression.AFTSurvivalRegression

例如,如果我执行以下操作:

from pyspark.ml.regression import AFTSurvivalRegression
from pyspark.ml.linalg import Vectors

training = spark.createDataFrame([
    (1.218, 1.0, Vectors.dense(1.560, -0.605)),
    (2.949, 0.0, Vectors.dense(0.346, 2.158)),
    (3.627, 0.0, Vectors.dense(1.380, 0.231)),
    (0.273, 1.0, Vectors.dense(0.520, 1.151)),
    (4.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", "features"])
quantileProbabilities = [0.25, 0.75]
aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
                        quantilesCol="quantiles")

model = aft.fit(training)
model.transform(training).show(truncate=False)

我得到了输出:

enter image description here

这是否意味着对于第一行,P(事件发生在0.832和9.48之间)= 50%?

由于

0 个答案:

没有答案