Question

我有一个包含三列的数据框：

 ID        Class     Score
 abc123    Science   1
 jkl456    Math      0
 zpe789    English   1
 abc123    Science   0
 jkl456    Math      0
 yth293    Art       1

我想按ID组合观察，并添加一个总和得分（总得分）的列，显示他们尝试的数量（即使他们弄错了），并计算正确的百分比，如下所示：

 ID        Class     Total Score     Number Attempted      Percent
 abc123    Science   1               2                      50
 jkl456    Math      0               2                       0
 zpe789    English   1               1                     100
 yth293    Art       1               1                     100

是否有任何R软件包或函数可能会在ID和相应的类中崩溃并产生这些结果？谢谢。

Answer 1

尝试：

library(dplyr)
df %>%
  group_by(ID) %>%
  summarize(TotalScore = sum(Score), 
            NumberAttempted=n(), 
            Percent = TotalScore/NumberAttempted*100)
#Source: local data frame [4 x 4]
#
#      ID TotalScore NumberAttempted Percent
#1 abc123          1               2      50
#2 jkl456          0               2       0
#3 yth293          1               1     100
#4 zpe789          1               1     100

为了演示dplyr包的有用性，这里是一个不使用特殊包的类似解决方案。

newdf <- data.frame(TotalScore = with(df, tapply(Score, ID, FUN=sum)))
newdf$NumberAttempted <- with(df, tapply(Score, ID, FUN=length))
newdf$Percent <- 100*newdf$TotalScore/newdf$NumberAttempted
newdf
#       TotalScore NumberAttempted Percent
#abc123          1               2      50
#jkl456          0               2       0
#yth293          1               1     100
#zpe789          1               1     100

作为结尾注释，带空格的变量名称可能会给进一步分析带来困难。

Answer 2

df <- read.table(textConnection("ID        Class     Score
 abc123    Science   1
 jkl456    Math      0
 zpe789    English   1
 abc123    Science   0
 jkl456    Math      0
 yth293    Art       1"), header = TRUE)

然后做：

library(dplyr)
df %>% group_by(ID) %>% 
  summarise("Total_Score" = sum(Score),
            "Number_Attempted" = n(),
            "Percent" = (Total_Score/Number_Attempted)*100)

按R中的列值组合行/观察值

2 个答案: