总结r

时间:2015-10-22 14:08:45

标签: r

我一直在研究http://www3.amherst.edu/~nhorton/r2/datasets/Batting.csv的以下csv文件,仅供我自己练习。

但我不确定如何执行以下操作:

  • 通过累加组件值来汇总同一团队(teamID)在同一年的观察结果。也就是说,每个团队每年最多只能有一条记录,而且这条记录应该有year, team name, total runs, total hits, total X2B ,…. Total HBP

这是我到目前为止的代码,但它每年只给我一个团队但我需要每年所有团队的总数(例如,1980年,我需要所有团队的总数,总分, ......,1981年,所有球队都有总数,总计,......等等)

newdat1 <- read.csv("http://www3.amherst.edu/~nhorton/r2/datasets/Batting.csv")

id <- split(1:nrow(newdata1), newdata1$yearID)

a2 <- data.frame(yearID=sapply(id, function(i)  newdata1$yearID[i[1]]),
    teamID=sapply(id,function(i) newdata$teamID[i[1]]),
        totalRuns=sapply(id, function(i) sum(newdata1$R[i],na.rm=TRUE)), 
        totalHits=sapply(id, function(i) sum(newdata1$H[i],na.rm=TRUE)),
    totalX2B=sapply(id, function(i) sum(newdata1$X2B[i],na.rm=TRUE)),
    totalX3B=sapply(id, function(i) sum(newdata1$X3B[i],na.rm=TRUE)),
    totalHR=sapply(id, function(i) sum(newdata1$HR[i],na.rm=TRUE)),
    totalBB=sapply(id, function(i) sum(newdata1$BB[i],na.rm=TRUE)), 
    totalSB=sapply(id, function(i) sum(newdata1$SB[i],na.rm=TRUE)),
    totalGIDP=sapply(id, function(i) sum(newdata1$GIDP[i],na.rm=TRUE)),
    totalIBB=sapply(id, function(i) sum(newdata1$IBB[i],na.rm=TRUE)),
    totalHBP=sapply(id, function(i) sum(newdata1$HBP[i],na.rm=TRUE)))
a2

3 个答案:

答案 0 :(得分:1)

也许尝试类似:

library("dplyr")
newdata1 %>%
    group_by(yearID, teamID) %>%
    summarize_each(funs(sum(., na.rm = T)), R, H, X2B, 
                      X3B, HR, BB, SB, GIDP, IBB, HBP)

当然,如果您对dplyr库感到满意,这将非常有用。这是一个猜测,而不是太密切地查看数据。

此外,您可以选择执行

,而不是列出您希望总结的每一列
 summarize_each(funs(sum(., na.rm = T)), -column_to_exclude1, -column_to_exlude2)

等等。

答案 1 :(得分:0)

我建议在plyr包中查看ddply。请参阅here,了解我认为您正在尝试做的事情。

对于此示例,请尝试以下代码:

# ddply function in the plyr package

library(plyr)

# summarize the dataframe newdat1, using yearID and teamID as grouping variables

outputdat <-ddply(newdat1, c("yearID", "teamID"), summarize, 
            totalRuns= sum(R),  # add all summary variables you need here...
            totalHits= sum(H),  # other summary functions (mean, sd etc) also work
            totalX2B = sum(X2B))

希望有帮助吗?

答案 2 :(得分:0)

library(plyr)
ddply(newdat1, ~ teamID + yearID, summarize, sum(R), sum(X2B), sum(SO), sum(IBB), sum(HBP))

最终总和(...,na.rm = TRUE)

data.table {}也可以这样做:

library(data.table)
DT <- as.data.table(newdat1[,-c(1,5)])
setkey(DT, teamID, yearID)
DT[, lapply(.SD, sum, na.rm=TRUE), .(teamID, yearID)]