我有一个包含三列的数据框:
ID Class Score
abc123 Science 1
jkl456 Math 0
zpe789 English 1
abc123 Science 0
jkl456 Math 0
yth293 Art 1
我想按ID组合观察,并添加一个总和得分(总得分)的列,显示他们尝试的数量(即使他们弄错了),并计算正确的百分比,如下所示:
ID Class Total Score Number Attempted Percent
abc123 Science 1 2 50
jkl456 Math 0 2 0
zpe789 English 1 1 100
yth293 Art 1 1 100
是否有任何R软件包或函数可能会在ID和相应的类中崩溃并产生这些结果?谢谢。
答案 0 :(得分:1)
尝试:
library(dplyr)
df %>%
group_by(ID) %>%
summarize(TotalScore = sum(Score),
NumberAttempted=n(),
Percent = TotalScore/NumberAttempted*100)
#Source: local data frame [4 x 4]
#
# ID TotalScore NumberAttempted Percent
#1 abc123 1 2 50
#2 jkl456 0 2 0
#3 yth293 1 1 100
#4 zpe789 1 1 100
为了演示dplyr
包的有用性,这里是一个不使用特殊包的类似解决方案。
newdf <- data.frame(TotalScore = with(df, tapply(Score, ID, FUN=sum)))
newdf$NumberAttempted <- with(df, tapply(Score, ID, FUN=length))
newdf$Percent <- 100*newdf$TotalScore/newdf$NumberAttempted
newdf
# TotalScore NumberAttempted Percent
#abc123 1 2 50
#jkl456 0 2 0
#yth293 1 1 100
#zpe789 1 1 100
作为结尾注释,带空格的变量名称可能会给进一步分析带来困难。
答案 1 :(得分:0)
df <- read.table(textConnection("ID Class Score
abc123 Science 1
jkl456 Math 0
zpe789 English 1
abc123 Science 0
jkl456 Math 0
yth293 Art 1"), header = TRUE)
然后做:
library(dplyr)
df %>% group_by(ID) %>%
summarise("Total_Score" = sum(Score),
"Number_Attempted" = n(),
"Percent" = (Total_Score/Number_Attempted)*100)