如何在SPSS中进行留一交叉验证

时间:2014-06-13 21:19:07

标签: spss cross-validation

我无法理解如何在SPSS中执行LOOCV。 我需要评估一个简单的线性回归 $ Y = AX + B $。 感谢。

1 个答案:

答案 0 :(得分:1)

对于线性回归,它是pretty easy,SPSS允许您在REGRESSION命令中保存统计信息。请参阅here for another example

REGRESSION
  /NOORIGIN 
  /DEPENDENT Y
  /METHOD=ENTER X
  /SAVE PRED (PredAll) DFIT (CVFit).

然后,留出一个预测可以计算为COMPUTE LeaveOneOut = PredAll - CVFit.但是对于非线性模型,SPSS不能为一个人提供方便的SAVE值,可以构建具有缺失值的重复数据集,然后使用SPLIT FILE,然后获取您想要的任何统计程序的统计数据。如果您的id变量只是数据集的行号,您只需要两个最大案例编号的循环,然后将所需的信息与新文件匹配。

以下是此过程的示例。

*Making some fake data to work with.
INPUT PROGRAM.
LOOP Id = 1 TO 10.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME Sim.
COMPUTE X = RV.NORMAL(10,5).
COMPUTE Y = 3 + 0.2*(X) + RV.NORMAL(0,0.2).
FORMATS Id (F2.0) X Y (F4.2).
EXECUTE.

*Original regression model with the leave one.
*out fits.
REGRESSION
  /NOORIGIN 
  /DEPENDENT Y
  /METHOD=ENTER X
  /SAVE PRED (PredAll) DFIT (CVFit).    

*Manual way to create stacked dataset
*can use with other non-linear models.
INPUT PROGRAM.
COMPUTE #Cases = 10.
LOOP #Id = 1 TO #Cases.
  LOOP #Iter = 1 TO #Cases.
    COMPUTE L1O = #Iter.
    COMPUTE Id = #Id.
    END CASE.
  END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME LeaveOneOut.

*Merging in original data.
MATCH FILES FILE = *
  /TABLE = 'Sim'
  /BY Id.

*Set missing to 
IF L1O = Id Y = $SYSMIS.
SORT CASES BY L1O.
SPLIT FILE BY L1O.
*You can replace regression with whatever procedure you are.
*interested in.
REGRESSION
  /NOORIGIN 
  /DEPENDENT Y
  /METHOD=ENTER X
  /SAVE PRED (CVFit2).
SPLIT FILE OFF.

*This shows the original leave one out stats.
*And new stats are the same besides some floating.
*point differences.
COMPUTE Test = (CVFit2 - (PredAll-CVFit)).
TEMPORARY.
SELECT IF (L1O = Id).
FREQ VAR Test.
EXECUTE.