具有重复玩家名称和跑步的板球数据子集。我的问题是有多少玩家总共得分超过5000?形成这些人的子集及其运行。数据如下。下面是数据的一瞥。
"Player" "Runs"---
SM Gavaskar 28
SS Naik 18
AL Wadekar 67
GR Viswanath 4
FM Engineer 32
BP Patel 82
ED Solkar 3
S Abid Ali 17
S Madan Lal 2
S Venkataraghavan 1
BS Bedi 0
SM Gavaskar 20
SS Naik 20
GK Bose 13
AL Wadekar 6
GR Viswanath 32
FM Engineer 4
BP Patel 12
AV Mankad 44
ED Solkar 0
S Abid Ali 6
S Madan Lal 3
SM Gavaskar 36
ED Solkar 8
AD Gaekwad 22
GR Viswanath 37
BP Patel 16
S Abid Ali
KD Ghavri
M Amarnath
FM Engineer
S Madan Lal
S Venkataraghavan
SM Gavaskar 65
FM Engineer 54
请建议方法。在excel中,我们会删除重复项并应用sumif
。在R?怎么样?
答案 0 :(得分:1)
假设您在Excel中的csv文件中有数据,其中第一列名为' player'代表玩家和第二列,名为'运行'表示运行次数。
dat <- read.csv("cricket.csv", header=TRUE) # read in the data
dat.nodup <- tapply(dat$runs, dat$player, function(x) sum(x, na.rm=TRUE)) # sum runs for each player with duplicate observations
dat.gt5000 <- dat.nodup[which(dat.nodup > 5000)] # keep only records with > 5000 runs
length(dat.gt5000) # Number of players with > 5000 runs