在group-by max()调用上转发匹配的行值(以了解与特定列的最大值对应的值)

时间:2015-11-05 16:41:10

标签: r data.table

我试图找出,在数据表中显示的每一年,事件的最大每日频率是每年以及它发生的日期。我可以按年获得最大值:

dt[, .N, by = DATE][, .(max(N)), by=format(DATE, "%Y")]

但是,如何才能提出与此最大值匹配的完整DATE(而不仅仅是年份)?

这是我尝试的内容:

dt[, .N, by=DATE][which(N==max(N)), .(max(N), d:=DATE),by=format(DATE, "%Y")]

根据此错误消息,它看起来确实不会起作用,并且不会:

Error in `[.data.table`(dt[, .N, by = DATE], which(N == max(N)), .(max(N),  : 
  'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=eval(format(DATE, "%Y")) should work. This is for efficiency so data.table can detect which columns are needed.

我知道如何轻松回溯到dt并抓住与最大值相对应的行,但我想做得更好。 如上所述,是否可以通过子集选择来实现此目的?

道歉,如果我错过了关于此的SO帖子,但无法找到任何内容。

以下是dt的示例:

> dput(dt[sample(1:600000, size = 500), DATE])
structure(c(16091, 15909, 15987, 16509, 16294, 16610, 16297, 
15898, 15928, 15949, 16351, 16203, 16215, 15799, 16506, 15931, 
16091, 15825, 15860, 15814, 15975, 16233, 16108, 16590, 15700, 
16019, 16178, 16287, 16730, 16366, 16678, 16010, 16157, 16116, 
15794, 16157, 16010, 16171, 16721, 16640, 16302, 15939, 15928, 
16325, 15837, 15848, 15730, 15828, 16414, 16431, 16389, 16003, 
16444, 16255, 16268, 16226, 16205, 15765, 16060, 15938, 16376, 
15934, 15871, 16163, 16568, 15899, 16597, 16160, 16538, 15703, 
16002, 16371, 16019, 16138, 16091, 15874, 16298, 16086, 15753, 
16310, 16209, 15843, 16307, 16472, 16319, 16519, 15743, 16480, 
16323, 16674, 16147, 16013, 15986, 16616, 16480, 16494, 16030, 
16614, 16447, 15991, 15977, 15884, 16707, 16614, 16470, 16193, 
16453, 16342, 16109, 15731, 16321, 16421, 15974, 16578, 16718, 
16183, 15721, 15854, 16470, 16368, 16399, 16433, 16721, 16624, 
16514, 15918, 16370, 15910, 16308, 15973, 16579, 16606, 16192, 
16445, 16671, 15927, 15958, 16140, 15957, 16623, 16416, 15852, 
15913, 16190, 15930, 16420, 15808, 15862, 16507, 16447, 16109, 
15732, 16700, 15911, 16183, 16215, 16584, 15840, 16628, 16138, 
16500, 16477, 16184, 16510, 16374, 16668, 16278, 16642, 16713, 
16324, 16200, 16255, 15960, 16395, 15869, 16282, 16736, 16164, 
16416, 16496, 16565, 15741, 16308, 16441, 16607, 16190, 15938, 
16045, 15758, 16219, 16165, 16357, 16353, 16731, 16063, 15740, 
16220, 16522, 15864, 15922, 16223, 15806, 16660, 16471, 15954, 
16369, 15750, 15957, 16156, 16367, 16654, 16165, 16109, 15863, 
16204, 15929, 15812, 15987, 16275, 16552, 15741, 15906, 15929, 
16295, 15974, 15749, 15830, 15892, 16266, 16208, 15793, 15768, 
15721, 16707, 15903, 16624, 16552, 16695, 16116, 16573, 16344, 
16452, 16539, 16195, 15851, 16140, 16152, 15736, 16179, 15846, 
16363, 16404, 16522, 16723, 16021, 16232, 16081, 16206, 16183, 
15920, 16543, 15989, 15974, 16212, 16396, 16473, 16502, 16532, 
16326, 15882, 16607, 15848, 15954, 16419, 15752, 16030, 16429, 
16222, 16213, 16626, 16049, 16738, 16256, 16198, 16599, 15727, 
16707, 16433, 15863, 16145, 16188, 15862, 15707, 16475, 16130, 
15887, 16647, 15974, 16221, 15773, 16059, 16662, 16250, 15689, 
15753, 15833, 16365, 16646, 16366, 16130, 16712, 15859, 16480, 
15983, 16377, 16091, 16121, 15821, 16505, 16018, 16254, 15937, 
16322, 16490, 15899, 16377, 16319, 16262, 16215, 16005, 16318, 
16488, 16350, 16275, 16723, 16616, 16593, 15918, 16264, 15897, 
15931, 16204, 16603, 16192, 16377, 15837, 16737, 16466, 16271, 
15804, 15987, 16622, 16634, 16227, 16297, 16597, 16232, 16393, 
15842, 15999, 15716, 16092, 16080, 16553, 16068, 16129, 16012, 
16383, 16150, 16611, 16602, 16254, 15728, 15958, 15827, 16111, 
16097, 16112, 16648, 16510, 16417, 16021, 16660, 15793, 16016, 
16188, 16034, 16415, 16270, 16728, 16153, 16028, 16286, 16731, 
15905, 15710, 16208, 16300, 16522, 16062, 16310, 16535, 16111, 
16682, 15957, 16051, 16597, 16063, 15828, 16658, 16213, 16262, 
15814, 15912, 16115, 15716, 15976, 16665, 16723, 15766, 15825, 
16682, 16547, 16402, 16486, 16085, 16231, 16126, 16398, 15762, 
16563, 15796, 15993, 15943, 16020, 15727, 16671, 16044, 15921, 
16511, 15787, 16128, 16376, 16502, 15751, 16317, 16444, 16032, 
15839, 16588, 15780, 15926, 16722, 16225, 16523, 16450, 16661, 
16702, 16223, 15977, 16586, 16221, 16252, 15853, 16309, 15838, 
16505, 16143, 16526, 15980, 15970, 15718, 16713, 16021, 16546, 
16469, 16452, 15729, 16309, 16543, 16386, 16554, 16349, 16595, 
16499, 16359, 16322, 16547, 16415, 16112, 15898, 16008, 16275, 
15975, 16197, 15740, 15959, 16346, 16364, 16522), class = "Date")

2 个答案:

答案 0 :(得分:2)

为什么不简单地将.SDwhich.max(N)进行分组?

require(data.table)
data.table(x)[, .N, by=x][, .SD[which.max(N)], by=year(x), .SDcols=1:2]
#    year          x N
# 1: 2014 2014-01-21 4
# 2: 2013 2013-09-26 4
# 3: 2015 2015-03-28 4
# 4: 2012 2012-12-26 1

熟悉.SD后,大多数操作只使用基本R函数。

关于您的尝试:data.table的一般形式是<{1}}中的 susbet 行,然后计算i按{分组{1}}。因此,您无法在jby中的群组中提供条件。并且i根本不是有效的语法。

请阅读vignettes。这些都在那里。

答案 1 :(得分:1)

这就是我提出的:

DT[, Y := year(DATE)]

DT[,
  copy(.SD)[, n := .N , by=DATE][which.max(n)]
, by=Y]


      Y       DATE n
1: 2014 2014-01-21 4
2: 2013 2013-09-26 4
3: 2015 2015-03-28 4
4: 2012 2012-12-26 1

我希望有更好的方法。我创建了Y,因为如果j中出现任何转换,则data.table目前不允许在by内使用列。