我一直在研究http://www3.amherst.edu/~nhorton/r2/datasets/Batting.csv的以下csv文件,仅供我自己练习。
但我不确定如何执行以下操作:
year, team name, total runs, total hits, total X2B ,…. Total HBP
。这是我到目前为止的代码,但它每年只给我一个团队但我需要每年所有团队的总数(例如,1980年,我需要所有团队的总数,总分, ......,1981年,所有球队都有总数,总计,......等等)
newdat1 <- read.csv("http://www3.amherst.edu/~nhorton/r2/datasets/Batting.csv")
id <- split(1:nrow(newdata1), newdata1$yearID)
a2 <- data.frame(yearID=sapply(id, function(i) newdata1$yearID[i[1]]),
teamID=sapply(id,function(i) newdata$teamID[i[1]]),
totalRuns=sapply(id, function(i) sum(newdata1$R[i],na.rm=TRUE)),
totalHits=sapply(id, function(i) sum(newdata1$H[i],na.rm=TRUE)),
totalX2B=sapply(id, function(i) sum(newdata1$X2B[i],na.rm=TRUE)),
totalX3B=sapply(id, function(i) sum(newdata1$X3B[i],na.rm=TRUE)),
totalHR=sapply(id, function(i) sum(newdata1$HR[i],na.rm=TRUE)),
totalBB=sapply(id, function(i) sum(newdata1$BB[i],na.rm=TRUE)),
totalSB=sapply(id, function(i) sum(newdata1$SB[i],na.rm=TRUE)),
totalGIDP=sapply(id, function(i) sum(newdata1$GIDP[i],na.rm=TRUE)),
totalIBB=sapply(id, function(i) sum(newdata1$IBB[i],na.rm=TRUE)),
totalHBP=sapply(id, function(i) sum(newdata1$HBP[i],na.rm=TRUE)))
a2
答案 0 :(得分:1)
也许尝试类似:
library("dplyr")
newdata1 %>%
group_by(yearID, teamID) %>%
summarize_each(funs(sum(., na.rm = T)), R, H, X2B,
X3B, HR, BB, SB, GIDP, IBB, HBP)
当然,如果您对dplyr
库感到满意,这将非常有用。这是一个猜测,而不是太密切地查看数据。
此外,您可以选择执行
,而不是列出您希望总结的每一列 summarize_each(funs(sum(., na.rm = T)), -column_to_exclude1, -column_to_exlude2)
等等。
答案 1 :(得分:0)
我建议在plyr包中查看ddply。请参阅here,了解我认为您正在尝试做的事情。
对于此示例,请尝试以下代码:
# ddply function in the plyr package
library(plyr)
# summarize the dataframe newdat1, using yearID and teamID as grouping variables
outputdat <-ddply(newdat1, c("yearID", "teamID"), summarize,
totalRuns= sum(R), # add all summary variables you need here...
totalHits= sum(H), # other summary functions (mean, sd etc) also work
totalX2B = sum(X2B))
希望有帮助吗?
答案 2 :(得分:0)
library(plyr)
ddply(newdat1, ~ teamID + yearID, summarize, sum(R), sum(X2B), sum(SO), sum(IBB), sum(HBP))
最终总和(...,na.rm = TRUE)
data.table {}也可以这样做:
library(data.table)
DT <- as.data.table(newdat1[,-c(1,5)])
setkey(DT, teamID, yearID)
DT[, lapply(.SD, sum, na.rm=TRUE), .(teamID, yearID)]