具有重复名称的数据子集

时间:2016-06-30 19:21:29

标签: r

具有重复玩家名称和跑步的板球数据子集。我的问题是有多少玩家总共得分超过5000?形成这些人的子集及其运行。数据如下。下面是数据的一瞥。

"Player"    "Runs"---
SM Gavaskar 28
SS Naik 18
AL Wadekar  67
GR Viswanath    4
FM Engineer 32
BP Patel    82
ED Solkar   3
S Abid Ali  17
S Madan Lal 2
S Venkataraghavan   1
BS Bedi 0
SM Gavaskar 20
SS Naik 20
GK Bose 13
AL Wadekar  6
GR Viswanath    32
FM Engineer 4
BP Patel    12
AV Mankad   44
ED Solkar   0
S Abid Ali  6
S Madan Lal 3
SM Gavaskar 36
ED Solkar   8
AD Gaekwad  22
GR Viswanath    37
BP Patel    16
S Abid Ali  
KD Ghavri   
M Amarnath  
FM Engineer 
S Madan Lal 
S Venkataraghavan   
SM Gavaskar 65
FM Engineer 54

请建议方法。在excel中,我们会删除重复项并应用sumif。在R?怎么样?

1 个答案:

答案 0 :(得分:1)

假设您在Excel中的csv文件中有数据,其中第一列名为' player'代表玩家和第二列,名为'运行'表示运行次数。

dat <- read.csv("cricket.csv", header=TRUE) # read in the data
dat.nodup <- tapply(dat$runs, dat$player, function(x) sum(x, na.rm=TRUE)) # sum runs for each player with duplicate observations
dat.gt5000 <- dat.nodup[which(dat.nodup > 5000)] # keep only records with > 5000 runs
length(dat.gt5000) # Number of players with > 5000 runs