这与data.table上的分组/查找的先前question有关,但是有额外的输出。
我正在尝试对子集.SD执行分组,并从每个子集中选择值。例如,在航班数据集中,我想知道:对于每个机场和月份,哪个UniqueCarrier和Destination具有最差的平均ArrDelay。因此基本上有两个级别的聚合。
我有如下工作解决方案..但是,如果有更好的解决方案,那将是很好的理解。
library(data.table)
library(hflights)
DT <- as.data.table(hflights)
setkey(DT, Origin, Month)
#The solution code...
DT[, {
t1 <- .SD[, .(mean(na.omit(ArrDelay))) , by=UniqueCarrier];
max1 <- which.max(t1$V1);
t2 <- .SD[, .(mean(na.omit(ArrDelay))) , by=Dest];
max2 <- which.max(t2$V1);
list( MaxAvgDelayForCarrier = t1$UniqueCarrier[max1], MaxAvgDelayByCarrier = t1$V1[max1], MaxAvgDelayByDest= t2$Dest[max2], MaxAvgDelayForDest= t2$V1[max2] )
}, by = .(Origin, Month)]
# Checking for correctness
head(DT[ .("HOU", 1), .(MaxAvgDelayByCarrier=mean(na.omit(ArrDelay))), by=UniqueCarrier][order(-MaxAvgDelayByCarrier)],1)
head(DT[ .("IAH", 2), .(MaxAvgDelayForDest=mean(na.omit(ArrDelay))), by=Dest][order(-MaxAvgDelayForDest)],1)
答案 0 :(得分:3)
我认为你的代码很好,但我会这样写:
RewriteCond %{QUERY_STRING} ^keyword=([^&]+)&searchwordsugg=&option=com_virtuemart&page=shop.browse&view=category$
RewriteRule ^component/search/$ /componet/virtuemart/?keyword=%1&search=true&view=category&option=com_virtuemart&virtuemart_category_id=0 [L,R]
给出了
DT[,c(
.SD[,
.(CMaxVal = mean(na.omit(ArrDelay))),
by=.(CMax = UniqueCarrier)][which.max(CMaxVal)],
.SD[,
.(DMaxVal = mean(na.omit(ArrDelay))),
by=.(DMax = Dest)][which.max(DMaxVal)]
),by=key(DT)]
无需存储这么多中间对象( Origin Month CMax CMaxVal DMax DMaxVal
1: HOU 1 F9 13.725806 PHL 20.12500
2: HOU 2 B6 17.822222 ECP 20.17308
3: HOU 3 EV 23.088889 PHL 46.06452
4: HOU 4 EV 27.847826 PHL 67.93333
5: HOU 5 EV 25.436620 PHL 75.61290
6: HOU 6 EV 16.930233 EWR 34.87755
7: HOU 7 B6 20.016129 CHS 21.54839
8: HOU 8 B6 30.163636 JFK 30.16364
9: HOU 9 DL 18.625000 EWR 14.32143
10: HOU 10 DL 17.803279 PHL 22.51613
11: HOU 11 F9 3.000000 EWR 18.46429
12: HOU 12 MQ 13.554502 EWR 28.17857
13: IAH 1 EV 15.682353 HNL 21.52632
14: IAH 2 MQ 19.946809 BPT 29.00000
15: IAH 3 AS 15.354839 SFO 27.43590
16: IAH 4 MQ 16.263441 SEA 22.48515
17: IAH 5 MQ 25.179104 DAY 25.96154
18: IAH 6 UA 24.453125 ANC 34.06667
19: IAH 7 OO 15.117419 DSM 32.39286
20: IAH 8 UA 17.297561 ANC 37.96552
21: IAH 9 UA 11.620000 SJU 16.76923
22: IAH 10 UA 11.601266 CID 16.88462
23: IAH 11 MQ 8.445545 CID 18.04167
24: IAH 12 XE 11.376852 HOB 25.95556
Origin Month CMax CMaxVal DMax DMaxVal
,t1
等)。
上述方法需要手动编码每个分组变量。你可以改为......
max1