我正在尝试根据各种因素(包括计数,释放速度,旋转速度)找到发生摆动的可能性。我有音高数据。我将“音高结果”的值更改为0和1(1是摇摆打击,0是其他)。数据集包含380,000多行。
数据:
str(FastballData1)
'data.frame': 389035 obs. of 30 variables:
$ GameId : int 449257 449257 449257 449257 449257 449257 449257 449257 449257 447167 ...
$ HomeTeamId : int 108 108 108 108 108 108 108 108 108 108 ...
$ AwayTeamId : int 117 117 117 117 117 117 117 117 117 118 ...
$ Inning : int 1 1 5 1 1 2 1 2 5 4 ...
$ InningTop : int 0 0 0 0 0 0 0 0 0 0 ...
$ PlateAppearance : int 2 2 2 2 2 7 2 7 2 4 ...
$ PitchOfPA : int 4 6 1 1 2 4 3 3 3 1 ...
$ Balls : int 2 3 0 0 0 2 1 2 0 0 ...
$ Strikes : int 1 2 0 0 1 1 1 0 2 0 ...
$ PitchResult : num 0 0 0 0 0 0 0 0 0 0 ...
$ BatterId : int 77171 77171 77171 77171 77171 77171 77171 77171 77171 77171 ...
$ PitcherId : int 64271 64271 64271 64271 64271 64271 64271 64271 64271 31921 ...
$ CatcherId : int 59270 59270 59270 59270 59270 59270 59270 59270 59270 67658 ...
$ UmpireId : int 427248 427248 427248 427248 427248 427248 427248 427248 427248 427156 ...
$ BatSide : Factor w/ 2 levels "L","R": 2 2 2 2 2 2 2 2 2 2 ...
$ PitchHand : Factor w/ 2 levels "L","R": 2 2 2 2 2 2 2 2 2 2 ...
$ PitcherSet : Factor w/ 2 levels "Stretch","Windup": 2 2 1 2 2 1 2 1 1 1 ...
$ PitchType : Factor w/ 2 levels "Fastball","Slider": 1 1 1 1 1 1 1 1 1 1 ...
$ ReleaseSpeed : Factor w/ 163594 levels "100.003","100.004",..: 118138 113206 96757 105450 108863 104792 99623 88801 106163 118932 ...
$ PitchTimeToPlate : num 0.409 0.41 0.412 0.415 0.412 ...
$ SpinAxis : Factor w/ 176445 levels "-0.389885","-0.468337",..: 105104 92644 108361 90391 102961 103650 106045 109234 109359 123561 ...
$ SpinRate : Factor w/ 89264 levels "1002.54","1005.62",..: 51306 60703 54411 46083 47822 32355 50456 49402 47238 44709 ...
$ HorzBreakPFX : num -6.64 -4.49 -7.85 -4.33 -5.69 ...
$ VertBreakPFX : num 8.44 9.62 8.92 10.16 7.95 ...
$ ReleaseHeight : num 5.48 5.4 5.32 5.51 5.53 ...
$ ReleaseSide : num 2.11 2.03 1.82 1.96 2.02 ...
$ Extension : num 5.91 6.18 6.45 5.88 6.07 ...
$ VertApproachAngle: Factor w/ 319725 levels "-0.216948","-0.31733",..: 173138 100071 94338 93526 224459 83541 172155 116761 79538 90746 ...
$ HorzApproachAngle: Factor w/ 409250 levels "-0.0001026","-0.000107",..: 130394 205101 54294 165802 185810 128876 159926 91503 109419 78589 ...
$ bscount : chr "2-1" "3-2" "0-0" "0-0" ...
我正在尝试运行以下代码:
glm(PitchResult ~ ReleaseSpeed + bscount + SpinRate,
data = FastballData1, family = binomial)
但是,当我执行此操作时,它将显示此代码,导致产生96 GB(这是荒谬的)的代码,从而导致R Studio崩溃。我的猜测是,这似乎创建了一个循环参考。
有人对如何解决此问题有任何建议吗?还是基于上述因素,以其他方式最佳地计算摆动打击的可能性?