我正在使用PySpark训练加速失败时间模型(来自pyspark.ml.regression import AFTSurvivalRegression)
现在我想将模型应用于新数据并获得事件在时间t(生存函数)之前发生的概率,我应该使用哪种方法?我不清楚文档:https://spark.apache.org/docs/2.1.0/api/python/pyspark.ml.html#pyspark.ml.regression.AFTSurvivalRegression
例如,如果我执行以下操作:
from pyspark.ml.regression import AFTSurvivalRegression
from pyspark.ml.linalg import Vectors
training = spark.createDataFrame([
(1.218, 1.0, Vectors.dense(1.560, -0.605)),
(2.949, 0.0, Vectors.dense(0.346, 2.158)),
(3.627, 0.0, Vectors.dense(1.380, 0.231)),
(0.273, 1.0, Vectors.dense(0.520, 1.151)),
(4.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", "features"])
quantileProbabilities = [0.25, 0.75]
aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
quantilesCol="quantiles")
model = aft.fit(training)
model.transform(training).show(truncate=False)
我得到了输出:
这是否意味着对于第一行,P(事件发生在0.832和9.48之间)= 50%?
由于