glm函数崩溃RStudio

时间:2018-11-24 00:48:27

标签: r rstudio probability glm

我正在尝试根据各种因素(包括计数,释放速度,旋转速度)找到发生摆动的可能性。我有音高数据。我将“音高结果”的值更改为0和1(1是摇摆打击,0是其他)。数据集包含380,000多行。

数据:

str(FastballData1)

'data.frame':   389035 obs. of  30 variables:
 $ GameId           : int  449257 449257 449257 449257 449257 449257 449257 449257 449257 447167 ...
 $ HomeTeamId       : int  108 108 108 108 108 108 108 108 108 108 ...
 $ AwayTeamId       : int  117 117 117 117 117 117 117 117 117 118 ...
 $ Inning           : int  1 1 5 1 1 2 1 2 5 4 ...
 $ InningTop        : int  0 0 0 0 0 0 0 0 0 0 ...
 $ PlateAppearance  : int  2 2 2 2 2 7 2 7 2 4 ...
 $ PitchOfPA        : int  4 6 1 1 2 4 3 3 3 1 ...
 $ Balls            : int  2 3 0 0 0 2 1 2 0 0 ...
 $ Strikes          : int  1 2 0 0 1 1 1 0 2 0 ...
 $ PitchResult      : num  0 0 0 0 0 0 0 0 0 0 ...
 $ BatterId         : int  77171 77171 77171 77171 77171 77171 77171 77171 77171 77171 ...
 $ PitcherId        : int  64271 64271 64271 64271 64271 64271 64271 64271 64271 31921 ...
 $ CatcherId        : int  59270 59270 59270 59270 59270 59270 59270 59270 59270 67658 ...
 $ UmpireId         : int  427248 427248 427248 427248 427248 427248 427248 427248 427248 427156 ...
 $ BatSide          : Factor w/ 2 levels "L","R": 2 2 2 2 2 2 2 2 2 2 ...
 $ PitchHand        : Factor w/ 2 levels "L","R": 2 2 2 2 2 2 2 2 2 2 ...
 $ PitcherSet       : Factor w/ 2 levels "Stretch","Windup": 2 2 1 2 2 1 2 1 1 1 ...
 $ PitchType        : Factor w/ 2 levels "Fastball","Slider": 1 1 1 1 1 1 1 1 1 1 ...
 $ ReleaseSpeed     : Factor w/ 163594 levels "100.003","100.004",..: 118138 113206 96757 105450 108863 104792 99623 88801 106163 118932 ...
 $ PitchTimeToPlate : num  0.409 0.41 0.412 0.415 0.412 ...
 $ SpinAxis         : Factor w/ 176445 levels "-0.389885","-0.468337",..: 105104 92644 108361 90391 102961 103650 106045 109234 109359 123561 ...
 $ SpinRate         : Factor w/ 89264 levels "1002.54","1005.62",..: 51306 60703 54411 46083 47822 32355 50456 49402 47238 44709 ...
 $ HorzBreakPFX     : num  -6.64 -4.49 -7.85 -4.33 -5.69 ...
 $ VertBreakPFX     : num  8.44 9.62 8.92 10.16 7.95 ...
 $ ReleaseHeight    : num  5.48 5.4 5.32 5.51 5.53 ...
 $ ReleaseSide      : num  2.11 2.03 1.82 1.96 2.02 ...
 $ Extension        : num  5.91 6.18 6.45 5.88 6.07 ...
 $ VertApproachAngle: Factor w/ 319725 levels "-0.216948","-0.31733",..: 173138 100071 94338 93526 224459 83541 172155 116761 79538 90746 ...
 $ HorzApproachAngle: Factor w/ 409250 levels "-0.0001026","-0.000107",..: 130394 205101 54294 165802 185810 128876 159926 91503 109419 78589 ...
 $ bscount          : chr  "2-1" "3-2" "0-0" "0-0" ...

我正在尝试运行以下代码:

glm(PitchResult ~ ReleaseSpeed + bscount + SpinRate, 
data = FastballData1, family = binomial)

但是,当我执行此操作时,它将显示此代码,导致产生96 GB(这是荒谬的)的代码,从而导致R Studio崩溃。我的猜测是,这似乎创建了一个循环参考。

有人对如何解决此问题有任何建议吗?还是基于上述因素,以其他方式最佳地计算摆动打击的可能性?

0 个答案:

没有答案