我正在尝试习惯data.table表示法,并希望稍微清理一下这段代码。我觉得好像有一个更好,更少内存贪婪的方式来处理这个问题。我需要计算现有数据框架的一些基本指标。我可以在不创建多个数据表的情况下完成吗?另外,如何用denom中的0处理NaN问题。我希望它打印0。
library("Lahman")
library("ggplot2")
library("data.table")
DT <- na.omit(data.table(PlayerId = Batting$playerID, SB = Batting$SB,
CS = Batting$CS, G = Batting$G))
DTa <- (DT[, list(TotalSB = sum(SB), TotalCS = sum(CS), TotalG = sum(G)),
by = 'PlayerId'])
DTb <- (DTa[,
list(PlayerId, TotalSB, TotalCS, TotalG,
SBAttempts = TotalSB + TotalCS,
SBSuccess = TotalSB / (TotalSB + TotalCS),
SBPerGame = TotalSB / TotalG)
])
print(DTb)
答案 0 :(得分:3)
嗯,这是一种稍微紧凑的方式。
# don't need quotes in `by=...`
DTa <- (DT[, list(TotalSB = sum(SB), TotalCS = sum(CS), TotalG = sum(G)),
by = PlayerId])
# use c(...):=list(...) to add multiple columns
DTa[,c("SBAttempts","SBSuccess","SBPerGame"):=
list(TotalSB + TotalCS,TotalSB / (TotalSB + TotalCS),TotalSB / TotalG)]
# replace NAN with 0
DTa[,names(DTa)[5:7]:=lapply(.SD,function(x)ifelse(is.nan(x),0,x)),.SDcols=5:7]
这会创建一个新的数据表DTa
,因为此表的行数少于原始表。添加了额外的列TotalXX
,并通过引用(无复制)将NaN转换为0。