根据列类别使用不同的操作按组减少data.table?

时间:2014-05-09 13:51:14

标签: r dataframe data.table

我们说我在R中有以下data.table:

require(data.table)
dt <- data.table(ID = paste0("x", 1:5), 
                 TV.Show=c("Farscape", "Farscape", "Star Trek", "Doctor Who", "Doctor Who"), 
                 Date = seq(as.Date("2014/01/01"), as.Date("2014/01/05"), "days"),  
                 Ratings.North = c(1.1, 0.9, 4.8, 3.4, 5.5), 
                 Ratings.South= c(0.1, NA, 1.8, 3.1, 3.5))
setkey(dt, "TV.Show")
dt

# ID    TV.Show       Date Ratings.North Ratings.South
# x4 Doctor Who 2014-01-04           3.4           3.1
# x5 Doctor Who 2014-01-05           5.5           3.5
# x1   Farscape 2014-01-01           1.1           0.1
# x2   Farscape 2014-01-02           0.9            NA
# x3  Star Trek 2014-01-03           4.8           1.8

我想减少这些数据。表格,分组由&#34; TV.Show&#34;其中:

  1. 我将相应数字列中的元素加在一起,
  2. 使用相应的非数字列的第一个元素,例如&#34; ID&#34;和&#34;日期&#34;作为简化的data.table行的新值。
  3. 或者换句话说,我想生成以下data.table:

    # ID    TV.Show       Date Ratings.North Ratings.South
    # x4 Doctor Who 2014-01-04           8.9           6.6
    # x1   Farscape 2014-01-01           2.0           0.1
    # x3  Star Trek 2014-01-03           4.8           1.8
    

2 个答案:

答案 0 :(得分:4)

使用ifelse

dt[, lapply(.SD, function(x) {
  ifelse(is.numeric(x), sum(x, na.rm = TRUE), x[1])
}), by = key(dt)]
#       TV.Show ID  Date Ratings.North Ratings.South
# 1: Doctor Who x4 16074           8.9           6.6
# 2:   Farscape x1 16071           2.0           0.1
# 3:  Star Trek x3 16073           4.8           1.8

答案 1 :(得分:3)

dt[, lapply(.SD, function(x) {
  if (is.numeric(x)) {
    return(sum(x, na.rm=TRUE))} else {
      return(head(x, 1))
      }}), 
   by=TV.Show]

#      TV.Show ID       Date Ratings.North Ratings.South
#1: Doctor Who x4 2014-01-04           8.9           6.6
#2:   Farscape x1 2014-01-01           2.0           0.1
#3:  Star Trek x3 2014-01-03           4.8           1.8