如何对.SD中的聚合值进行进一步的分组和查找以获取data.table -

时间:2015-07-08 21:35:03

标签: r data.table

这与data.table上的分组/查找的先前question有关,但是有额外的输出。

我正在尝试对子集.SD执行分组,并从每个子集中选择值。例如,在航班数据集中,我想知道:对于每个机场和月份,哪个UniqueCarrier和Destination具有最差的平均ArrDelay。因此基本上有两个级别的聚合。

我有如下工作解决方案..但是,如果有更好的解决方案,那将是很好的理解。

library(data.table)
library(hflights)

DT <- as.data.table(hflights)

setkey(DT, Origin, Month)

#The solution code...
DT[, {
 t1 <- .SD[, .(mean(na.omit(ArrDelay))) , by=UniqueCarrier];
 max1 <- which.max(t1$V1);
 t2 <- .SD[, .(mean(na.omit(ArrDelay))) , by=Dest];
 max2 <- which.max(t2$V1);
 list( MaxAvgDelayForCarrier = t1$UniqueCarrier[max1], MaxAvgDelayByCarrier = t1$V1[max1],  MaxAvgDelayByDest= t2$Dest[max2], MaxAvgDelayForDest= t2$V1[max2] )
},  by = .(Origin, Month)]

# Checking for correctness
head(DT[ .("HOU", 1), .(MaxAvgDelayByCarrier=mean(na.omit(ArrDelay))), by=UniqueCarrier][order(-MaxAvgDelayByCarrier)],1)
head(DT[ .("IAH", 2), .(MaxAvgDelayForDest=mean(na.omit(ArrDelay))), by=Dest][order(-MaxAvgDelayForDest)],1)

1 个答案:

答案 0 :(得分:3)

我认为你的代码很好,但我会这样写:

RewriteCond %{QUERY_STRING} ^keyword=([^&]+)&searchwordsugg=&option=com_virtuemart&page=shop.browse&view=category$
RewriteRule ^component/search/$ /componet/virtuemart/?keyword=%1&search=true&view=category&option=com_virtuemart&virtuemart_category_id=0 [L,R]

给出了

DT[,c(

  .SD[, 
    .(CMaxVal = mean(na.omit(ArrDelay))),
  by=.(CMax = UniqueCarrier)][which.max(CMaxVal)],

  .SD[, 
    .(DMaxVal = mean(na.omit(ArrDelay))),
  by=.(DMax = Dest)][which.max(DMaxVal)]

),by=key(DT)]

无需存储这么多中间对象( Origin Month CMax CMaxVal DMax DMaxVal 1: HOU 1 F9 13.725806 PHL 20.12500 2: HOU 2 B6 17.822222 ECP 20.17308 3: HOU 3 EV 23.088889 PHL 46.06452 4: HOU 4 EV 27.847826 PHL 67.93333 5: HOU 5 EV 25.436620 PHL 75.61290 6: HOU 6 EV 16.930233 EWR 34.87755 7: HOU 7 B6 20.016129 CHS 21.54839 8: HOU 8 B6 30.163636 JFK 30.16364 9: HOU 9 DL 18.625000 EWR 14.32143 10: HOU 10 DL 17.803279 PHL 22.51613 11: HOU 11 F9 3.000000 EWR 18.46429 12: HOU 12 MQ 13.554502 EWR 28.17857 13: IAH 1 EV 15.682353 HNL 21.52632 14: IAH 2 MQ 19.946809 BPT 29.00000 15: IAH 3 AS 15.354839 SFO 27.43590 16: IAH 4 MQ 16.263441 SEA 22.48515 17: IAH 5 MQ 25.179104 DAY 25.96154 18: IAH 6 UA 24.453125 ANC 34.06667 19: IAH 7 OO 15.117419 DSM 32.39286 20: IAH 8 UA 17.297561 ANC 37.96552 21: IAH 9 UA 11.620000 SJU 16.76923 22: IAH 10 UA 11.601266 CID 16.88462 23: IAH 11 MQ 8.445545 CID 18.04167 24: IAH 12 XE 11.376852 HOB 25.95556 Origin Month CMax CMaxVal DMax DMaxVal t1等)。

上述方法需要手动编码每个分组变量。你可以改为......

max1