R glm.fit不返回概率?

时间:2015-11-26 16:03:32

标签: r glm lm predict

首先在这里发帖,在R的新手。如果我没有得到这个帖子,请耐心等待我:)。

我正在尝试使用glm()来拟合模型,然后在模型上使用预测。

  fit_GLM <- glm(y ~., data = traintemp, family = "binomial")
  pred_GLM <- predict(fit_GLM, newdata = testtemp)

我的训练数据包括大约430000个观测值,6个预测值和二元结果。我尝试用0-1或False-True更改结果。

我的测试数据包含大约215000个观察结果。

我可以成功运行模型,但预测函数返回的数据有点奇怪。 (对我来说)我期待一个概率,但函数返回:

         Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
    -0.0433000 -0.0006504  0.0004760  0.0103800  0.0024810  1.0020000 

我错过了一些明显的东西吗?

另外,如果我改为运行lm(),结果非常相似,但运行速度要快得多,那么它会怎样?

编辑:我的数据示例:

TripType VisitNumber Weekday         Upc ScanCount DepartmentDescription FinelineNumber
1        0           7  Friday 60538815980         1                 SHOES           8931
2        0           7  Friday  7410811099         1         PERSONAL CARE           4504
3        0           8  Friday  2006613744         2 PAINT AND ACCESSORIES           1017
4        0           8  Friday  2006618783         2 PAINT AND ACCESSORIES           1017
5        0           8  Friday  7004802737         1 PAINT AND ACCESSORIES           2802
6        0           8  Friday  2238495318         1 PAINT AND ACCESSORIES           4501

谢谢你,感恩节快乐!

编辑23火车:

TripType Weekday         Upc ScanCount    DepartmentDescription FinelineNumber
1         0  Friday 60538815980         1                    SHOES           8931
2         0  Friday  7410811099         1            PERSONAL CARE           4504
3         0  Friday  2006613744         2    PAINT AND ACCESSORIES           1017
4         0  Friday  2006618783         2    PAINT AND ACCESSORIES           1017
5         0  Friday  7004802737         1    PAINT AND ACCESSORIES           2802
6         0  Friday  2238495318         1    PAINT AND ACCESSORIES           4501
7         0  Friday  5200010239         1              DSD GROCERY           4606
8         0  Friday 88679300501         2    PAINT AND ACCESSORIES           3504
9         0  Friday  2238400200         2    PAINT AND ACCESSORIES           3565
10        0  Friday 72450408840         1    PAINT AND ACCESSORIES           1028
11        0  Friday 25541500000         2                    DAIRY           1305
12        0  Friday 72450403700         2    PAINT AND ACCESSORIES           1018
13        0  Friday  7874204967         1 HOUSEHOLD CHEMICALS/SUPP            707
14        0  Friday  3270011053         3        PETS AND SUPPLIES           1001
15        0  Friday  1070080727         1      IMPULSE MERCHANDISE            115
16        0  Friday        3107         1                  PRODUCE            103
17        0  Friday        4011         1                  PRODUCE           5501
18        0  Friday  6414410235         1              DSD GROCERY           2008
19        0  Friday  4178900743         1        GROCERY DRY GOODS           3114
20        0  Friday  7800002374         1              DSD GROCERY           3467

测试:

   TripType Weekday         Upc ScanCount    DepartmentDescription FinelineNumber
1         0  Friday 68113152929        -1       FINANCIAL SERVICES           1000
2         0  Friday  2238403510         2    PAINT AND ACCESSORIES           3565
3         0  Friday  2006613743         1    PAINT AND ACCESSORIES           1017
4         0  Friday  2238400200        -1    PAINT AND ACCESSORIES           3565
5         0  Friday 22006000000         1    MEAT - FRESH & FROZEN           6009
6         0  Friday  2236760452         1    PAINT AND ACCESSORIES              7
7         0  Friday 88679300501        -1    PAINT AND ACCESSORIES           3504
8         0  Friday  3019294203         1    PAINT AND ACCESSORIES           2801
9         0  Friday  2310010776         1        PETS AND SUPPLIES           3300
10        0  Friday  5114139038         1    PAINT AND ACCESSORIES           4415
11        0  Friday  5114197561         1    PAINT AND ACCESSORIES           4415
12        0  Friday  2800053970         1  CANDY, TOBACCO, COOKIES            115
13        0  Friday  7794800902         1              DSD GROCERY           7950
14        0  Friday  7920018317         1      IMPULSE MERCHANDISE            110
15        0  Friday  3500076633         1            PERSONAL CARE            203
16        0  Friday  5460010568         1 HOUSEHOLD CHEMICALS/SUPP             52
17        0  Friday  2899521479         1       FABRICS AND CRAFTS           1059
18        0  Friday  2899521979         1       FABRICS AND CRAFTS           1062
19        0  Friday  1200004300         1              DSD GROCERY           9501
20        0  Friday 88743955560         1                MENS WEAR            144

1 个答案:

答案 0 :(得分:2)

来自?predict.glm

所需的预测类型。默认值是线性预测变量的比例;替代“响应”是响应变量的规模。因此,对于默认二项模型,默认预测是对数 - 概率(对数标度的概率),类型=“响应”给出预测的概率。 “terms”选项返回一个矩阵,给出模型公式中每个术语在线性预测器标度上的拟合值。

所以在你的情况下:

pred_GLM <- predict(fit_GLM, newdata = testtemp, type = "response")