按R中的列值组合行/观察值

时间:2015-06-18 15:45:04

标签: r packages

我有一个包含三列的数据框:

 ID        Class     Score
 abc123    Science   1
 jkl456    Math      0
 zpe789    English   1
 abc123    Science   0
 jkl456    Math      0
 yth293    Art       1

我想按ID组合观察,并添加一个总和得分(总得分)的列,显示他们尝试的数量(即使他们弄错了),并计算正确的百分比,如下所示:

 ID        Class     Total Score     Number Attempted      Percent
 abc123    Science   1               2                      50
 jkl456    Math      0               2                       0
 zpe789    English   1               1                     100
 yth293    Art       1               1                     100

是否有任何R软件包或函数可能会在ID和相应的类中崩溃并产生这些结果?谢谢。

2 个答案:

答案 0 :(得分:1)

尝试:

library(dplyr)
df %>%
  group_by(ID) %>%
  summarize(TotalScore = sum(Score), 
            NumberAttempted=n(), 
            Percent = TotalScore/NumberAttempted*100)
#Source: local data frame [4 x 4]
#
#      ID TotalScore NumberAttempted Percent
#1 abc123          1               2      50
#2 jkl456          0               2       0
#3 yth293          1               1     100
#4 zpe789          1               1     100

为了演示dplyr包的有用性,这里是一个不使用特殊包的类似解决方案。

newdf <- data.frame(TotalScore = with(df, tapply(Score, ID, FUN=sum)))
newdf$NumberAttempted <- with(df, tapply(Score, ID, FUN=length))
newdf$Percent <- 100*newdf$TotalScore/newdf$NumberAttempted
newdf
#       TotalScore NumberAttempted Percent
#abc123          1               2      50
#jkl456          0               2       0
#yth293          1               1     100
#zpe789          1               1     100

作为结尾注释,带空格的变量名称可能会给进一步分析带来困难。

答案 1 :(得分:0)

df <- read.table(textConnection("ID        Class     Score
 abc123    Science   1
 jkl456    Math      0
 zpe789    English   1
 abc123    Science   0
 jkl456    Math      0
 yth293    Art       1"), header = TRUE)

然后做:

library(dplyr)
df %>% group_by(ID) %>% 
  summarise("Total_Score" = sum(Score),
            "Number_Attempted" = n(),
            "Percent" = (Total_Score/Number_Attempted)*100)