R中的COUNTIF有多个限制

时间:2016-07-31 14:34:37

标签: r function vector subset countif

我有来自retrosheet.org的事件文件数据。这是关于棒球比赛格式的数据,每次观察是棒球赛季每场比赛中每场比赛的描述(包括游戏,球员和比赛的参考变量)。

> str(e.2015.1990)
'data.frame':   4813807 obs. of  42 variables:
 $ GAME.ID                              : Factor w/ 60464 levels "ANA201504100",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ INNING                               : num  1 1 1 1 1 1 1 1 1 2 ...
 $ BATTING.TEAM                         : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 2 2 1 ...
 $ OUTS                                 : int  0 1 2 2 2 2 0 1 2 0 ...
 $ BATTER                               : Factor w/ 5107 levels "abrej003","ackld001",..: 73 167 33 120 163 100 34 256 200 209 ...
 $ BATTER.HAND                          : Factor w/ 2 levels "L","R": 2 1 2 1 2 1 1 2 2 2 ...
 $ RES.BATTER                           : Factor w/ 5107 levels "abrej003","ackld001",..: 73 167 33 120 163 100 34 256 200 209 ...
 $ RES.BATTER.HAND                      : Factor w/ 2 levels "L","R": 2 1 2 1 2 1 1 2 2 2 ...
 $ PITCHER                              : Factor w/ 3481 levels "abadf001","albem001",..: 187 187 187 187 187 187 204 204 204 187 ...
 $ PITCHER.HAND                         : Factor w/ 2 levels "L","R": 1 1 1 1 1 1 1 1 1 1 ...
 $ RES.PITCHER                          : Factor w/ 3481 levels "abadf001","albem001",..: 187 187 187 187 187 187 204 204 204 187 ...
 $ RES.PITCHER.HAND                     : Factor w/ 2 levels "L","R": 1 1 1 1 1 1 1 1 1 1 ...
 $ FIRST.RUNNER                         : Factor w/ 4369 levels "","abrej003",..: 1 1 1 1 104 140 1 1 1 1 ...
 $ SECOND.RUNNER                        : Factor w/ 4048 levels "","abrej003",..: 1 1 1 26 1 90 1 1 1 1 ...
 $ THIRD.RUNNER                         : Factor w/ 3729 levels "","ackld001",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ EVENT.TEXT                           : chr  "63/G" "6/P" "D8/L+" "S9/G.2-H" ...
 $ EVENT.TYPE                           : Factor w/ 21 levels "2","3","4","5",..: 1 1 19 18 18 1 1 1 1 1 ...
 $ AB.FLAG                              : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ HIT.VALUE                            : int  1 1 3 2 2 1 1 1 1 1 ...
 $ SH.FLAG                              : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ SF.FLAG                              : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ DOUBLE.PLAY.FLAG                     : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ TRIPLE.PLAY.FLAG                     : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ RBI.ON.PLAY                          : num  0 0 0 1 0 0 0 0 0 0 ...
 $ BATTED.BALL.TYPE                     : Factor w/ 5 levels "","F","G","L",..: 3 5 4 3 4 5 3 3 5 4 ...
 $ BATTER.DEST                          : int  0 0 2 1 1 0 0 0 0 0 ...
 $ RUNNER.ON.1ST.DEST                   : int  0 0 0 0 2 1 0 0 0 0 ...
 $ RUNNER.ON.2ND.DEST                   : int  0 0 0 4 0 2 0 0 0 0 ...
 $ RUNNER.ON.3RD.DEST                   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ SB.FOR.RUNNER.ON.1ST.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ SB.FOR.RUNNER.ON.2ND.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ SB.FOR.RUNNER.ON.3RD.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ CS.FOR.RUNNER.ON.1ST.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ CS.FOR.RUNNER.ON.2ND.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ CS.FOR.RUNNER.ON.3RD.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ PO.FOR.RUNNER.ON.1ST.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ PO.FOR.RUNNER.ON.2ND.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ PO.FOR.RUNNER.ON.3RD.FLAG            : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ RESPONSIBLE.PITCHER.FOR.RUNNER.ON.1ST: Factor w/ 3433 levels "","albua001",..: 1 1 1 1 161 161 1 1 1 1 ...
 $ RESPONSIBLE.PITCHER.FOR.RUNNER.ON.2ND: Factor w/ 3408 levels "","abadf001",..: 1 1 1 133 1 133 1 1 1 1 ...
 $ RESPONSIBLE.PITCHER.FOR.RUNNER.ON.3RD: Factor w/ 3337 levels "","abadf001",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ EVENT.NUM                            : Factor w/ 177 levels "1","10","100",..: 1 90 101 112 123 134 145 156 167 2 ...

由此,我想为每个游戏计算每个玩家的游戏总数。我想格式化一个数据框,这样每个观察结果都是一个玩家在本赛季的一场比赛中表现的描述,每个比赛中的每个玩家都构成了整个观察结果。

我创建了一个包含两列GAME.ID和PLAYER.ID的新数据库,这样每个游戏中的每个STARTER都会构成整个观察结果。

> str(k.2015.1990)
'data.frame':   1146866 obs. of  2 variables:
 $ GAME.ID  : Factor w/ 60464 levels "ANA201504100",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ PLAYER.ID: Factor w/ 4699 levels "altuj001","bettm001",..: 11 11 11 12 14 12 12 24 24 24 ...

我认为我接下来需要做的是创建额外的向量(对于我想要计算的每个属性),这样每个对所述向量的观察都会创建我的事件数据的唯一子集,定义如下:

e.2015.1990$GAME.ID = k.2015.1990$GAME.ID
e.2015.1990$PLAYER.ID = k.2015.1990$PLAYER.ID

然后从该子集计算该stat。我知道如何在R中创建向量和子集,但不知道为每个观察创建唯一子集的向量。我想我需要使用

function(x)

这样做;但是,我是R的新手,没有这方面的经验。

为了方便起见,我打算尝试制作一个可重现的例子。在这个例子中,目标是计算Angel 2015常规赛前两场比赛中每位球员的命中总数。

我制作了一个事件文件数据的子集,其中包含与这两个游戏相对应的156个观察结果。为简单起见,我只包含变量GAME.ID,BATTER和HIT.VALUE。

         GAME.ID   BATTER HIT.VALUE
1   ANA201504100 escoa003         1
2   ANA201504100 mousm001         1
3   ANA201504100 cainl001         3
4   ANA201504100 hosme001         2
5   ANA201504100 morak001         2
6   ANA201504100 gorda001         1
7   ANA201504100 calhk001         1
8   ANA201504100 troum001         1
9   ANA201504100 pujoa001         1
10  ANA201504100 riosa002         1
11  ANA201504100 peres002         1
12  ANA201504100 infao001         1
13  ANA201504100 freed001         1
14  ANA201504100 cronc002         1
15  ANA201504100 aybae001         1
16  ANA201504100 escoa003         1
17  ANA201504100 mousm001         1
18  ANA201504100 cainl001         1
19  ANA201504100 hosme001         1
20  ANA201504100 morak001         1
21  ANA201504100 iannc001         1
22  ANA201504100 cowgc001         2
23  ANA201504100 giavj001         1
24  ANA201504100 calhk001         3
25  ANA201504100 troum001         1
26  ANA201504100 pujoa001         1
27  ANA201504100 gorda001         1
28  ANA201504100 riosa002         1
29  ANA201504100 peres002         1
30  ANA201504100 freed001         2
31  ANA201504100 cronc002         1
32  ANA201504100 aybae001         1
33  ANA201504100 iannc001         1
34  ANA201504100 infao001         1
35  ANA201504100 escoa003         2
36  ANA201504100 mousm001         1
37  ANA201504100 cainl001         2
38  ANA201504100 hosme001         1
39  ANA201504100 cowgc001         1
40  ANA201504100 giavj001         1
41  ANA201504100 calhk001         1
42  ANA201504100 morak001         5
43  ANA201504100 gorda001         1
44  ANA201504100 riosa002         1
45  ANA201504100 peres002         1
46  ANA201504100 troum001         2
47  ANA201504100 pujoa001         1
48  ANA201504100 freed001         5
49  ANA201504100 cronc002         1
50  ANA201504100 infao001         1
51  ANA201504100 escoa003         1
52  ANA201504100 mousm001         2
53  ANA201504100 cainl001         1
54  ANA201504100 cainl001         1
55  ANA201504100 aybae001         1
56  ANA201504100 iannc001         1
57  ANA201504100 joycm001         3
58  ANA201504100 giavj001         1
59  ANA201504100 hosme001         1
60  ANA201504100 morak001         1
61  ANA201504100 gorda001         1
62  ANA201504100 riosa002         1
63  ANA201504100 riosa002         1
64  ANA201504100 calhk001         1
65  ANA201504100 troum001         2
66  ANA201504100 pujoa001         1
67  ANA201504100 freed001         1
68  ANA201504100 peres002         2
69  ANA201504100 infao001         2
70  ANA201504100 escoa003         1
71  ANA201504100 mousm001         1
72  ANA201504100 cainl001         1
73  ANA201504100 hosme001         1
74  ANA201504100 morak001         1
75  ANA201504100 cronc002         1
76  ANA201504100 aybae001         1
77  ANA201504100 iannc001         1
78  ANA201504100 joycm001         1
79  ANA201504110 escoa003         1
80  ANA201504110 mousm001         1
81  ANA201504110 cainl001         1
82  ANA201504110 hosme001         1
83  ANA201504110 calhk001         5
84  ANA201504110 troum001         2
85  ANA201504110 pujoa001         1
86  ANA201504110 joycm001         1
87  ANA201504110 freed001         1
88  ANA201504110 morak001         1
89  ANA201504110 gorda001         1
90  ANA201504110 riosa002         1
91  ANA201504110 aybae001         2
92  ANA201504110 navae001         1
93  ANA201504110 buted001         1
94  ANA201504110 giavj001         1
95  ANA201504110 peres002         1
96  ANA201504110 infao001         1
97  ANA201504110 escoa003         1
98  ANA201504110 giavj001         1
99  ANA201504110 calhk001         1
100 ANA201504110 troum001         1
101 ANA201504110 mousm001         5
102 ANA201504110 cainl001         2
103 ANA201504110 hosme001         1
104 ANA201504110 hosme001         1
105 ANA201504110 morak001         3
106 ANA201504110 gorda001         1
107 ANA201504110 riosa002         2
108 ANA201504110 peres002         5
109 ANA201504110 infao001         2
110 ANA201504110 escoa003         1
111 ANA201504110 pujoa001         1
112 ANA201504110 joycm001         1
113 ANA201504110 freed001         1
114 ANA201504110 mousm001         1
115 ANA201504110 cainl001         1
116 ANA201504110 hosme001         2
117 ANA201504110 morak001         2
118 ANA201504110 gorda001         1
119 ANA201504110 riosa002         1
120 ANA201504110 aybae001         1
121 ANA201504110 navae001         1
122 ANA201504110 buted001         2
123 ANA201504110 giavj001         1
124 ANA201504110 calhk001         3
125 ANA201504110 troum001         2
126 ANA201504110 pujoa001         1
127 ANA201504110 riosa002         1
128 ANA201504110 peres002         2
129 ANA201504110 infao001         1
130 ANA201504110 escoa003         2
131 ANA201504110 mousm001         1
132 ANA201504110 joycm001         1
133 ANA201504110 freed001         1
134 ANA201504110 aybae001         1
135 ANA201504110 cainl001         1
136 ANA201504110 hosme001         1
137 ANA201504110 morak001         2
138 ANA201504110 gorda001         1
139 ANA201504110 riosa002         1
140 ANA201504110 navae001         1
141 ANA201504110 iannc001         1
142 ANA201504110 giavj001         1
143 ANA201504110 peres002         1
144 ANA201504110 infao001         1
145 ANA201504110 escoa003         1
146 ANA201504110 calhk001         1
147 ANA201504110 troum001         1
148 ANA201504110 pujoa001         1
149 ANA201504110 mousm001         2
150 ANA201504110 cainl001         1
151 ANA201504110 hosme001         1
152 ANA201504110 morak001         1
153 ANA201504110 gorda001         1
154 ANA201504110 joycm001         1
155 ANA201504110 freed001         1
156 ANA201504110 aybae001         1

我还制作了新数据库的一个子集,对应于这两场比赛的40名首发球员。

             GAME.ID PLAYER.ID
1       ANA201504100  escoa003
60465   ANA201504100  mousm001
120929  ANA201504100  cainl001
181393  ANA201504100  hosme001
241857  ANA201504100  morak001
302321  ANA201504100  gorda001
362785  ANA201504100  riosa002
423249  ANA201504100  peres002
483713  ANA201504100  infao001
1117610 ANA201504100  vargj001
573434  ANA201504100  calhk001
633898  ANA201504100  troum001
694362  ANA201504100  pujoa001
754826  ANA201504100  freed001
815290  ANA201504100  cronc002
875754  ANA201504100  aybae001
936218  ANA201504100  iannc001
996682  ANA201504100  cowgc001
1057146 ANA201504100  giavj001
1117613 ANA201504100  santh001
2       ANA201504110  escoa003
60466   ANA201504110  mousm001
120930  ANA201504110  cainl001
181394  ANA201504110  hosme001
241858  ANA201504110  morak001
302322  ANA201504110  gorda001
362786  ANA201504110  riosa002
423250  ANA201504110  peres002
483714  ANA201504110  infao001
2100000 ANA201504110  guthj001
573435  ANA201504110  calhk001
633899  ANA201504110  troum001
694363  ANA201504110  pujoa001
754827  ANA201504110  joycm001
815291  ANA201504110  freed001
875755  ANA201504110  aybae001
936219  ANA201504110  navae001
996683  ANA201504110  buted001
1057147 ANA201504110  giavj001
2100001 ANA201504110  weavj003

我认为应该有一种方法可以向后一个数据库添加一个列,以便每个观察引用其行上的GAME.ID和PLAYER.ID条目,搜索前一个数据库以隔离那些GAME.ID = GAME.ID和PLAYER.ID = BATTER,计算该子集中观察的数量,其中HIT.VALUE> 1(1 =默认,2 =单,3 =双,4 =三,5 =本垒打),然后将该计数返回到观察。在excel中,这可以使用CountIf()函数来完成,我可以轻松地复制向量的长度。不过,我不知道如何在R中做到这一点。

1 个答案:

答案 0 :(得分:0)

我认为这可能就是你要找的东西。它按GAME.IDBATTER对第二个到最后一个数据集进行分组,然后计算每个组的点击次数> 1。

library(data.table)
dt<-setDT(df)[, list(count_hits = sum(HIT.VALUE>1)),by=c("GAME.ID","BATTER")]

head(dt)
        GAME.ID   BATTER count_hits
1: ANA201504100 escoa003          1
2: ANA201504100 mousm001          1
3: ANA201504100 cainl001          2
4: ANA201504100 hosme001          1
5: ANA201504100 morak001          2
6: ANA201504100 gorda001          0

基地R的另一个选择是:

res<-aggregate(x=list(count_hits=df$HIT.VALUE), by=list(GAME.ID=df$GAME.ID,BATTER=df$BATTER), FUN = function(x) sum(x>1) )

head(res)
       GAME.ID   BATTER count_hits
1 ANA201504100 aybae001          0
2 ANA201504110 aybae001          1
3 ANA201504110 buted001          1
4 ANA201504100 cainl001          2
5 ANA201504110 cainl001          1
6 ANA201504100 calhk001          1