我们说我在R中有以下data.table:
require(data.table)
dt <- data.table(ID = paste0("x", 1:5),
TV.Show=c("Farscape", "Farscape", "Star Trek", "Doctor Who", "Doctor Who"),
Date = seq(as.Date("2014/01/01"), as.Date("2014/01/05"), "days"),
Ratings.North = c(1.1, 0.9, 4.8, 3.4, 5.5),
Ratings.South= c(0.1, NA, 1.8, 3.1, 3.5))
setkey(dt, "TV.Show")
dt
# ID TV.Show Date Ratings.North Ratings.South
# x4 Doctor Who 2014-01-04 3.4 3.1
# x5 Doctor Who 2014-01-05 5.5 3.5
# x1 Farscape 2014-01-01 1.1 0.1
# x2 Farscape 2014-01-02 0.9 NA
# x3 Star Trek 2014-01-03 4.8 1.8
我想减少这些数据。表格,分组由&#34; TV.Show&#34;其中:
或者换句话说,我想生成以下data.table:
# ID TV.Show Date Ratings.North Ratings.South
# x4 Doctor Who 2014-01-04 8.9 6.6
# x1 Farscape 2014-01-01 2.0 0.1
# x3 Star Trek 2014-01-03 4.8 1.8
答案 0 :(得分:4)
使用ifelse
?
dt[, lapply(.SD, function(x) {
ifelse(is.numeric(x), sum(x, na.rm = TRUE), x[1])
}), by = key(dt)]
# TV.Show ID Date Ratings.North Ratings.South
# 1: Doctor Who x4 16074 8.9 6.6
# 2: Farscape x1 16071 2.0 0.1
# 3: Star Trek x3 16073 4.8 1.8
答案 1 :(得分:3)
dt[, lapply(.SD, function(x) {
if (is.numeric(x)) {
return(sum(x, na.rm=TRUE))} else {
return(head(x, 1))
}}),
by=TV.Show]
# TV.Show ID Date Ratings.North Ratings.South
#1: Doctor Who x4 2014-01-04 8.9 6.6
#2: Farscape x1 2014-01-01 2.0 0.1
#3: Star Trek x3 2014-01-03 4.8 1.8