在data.table中使用max时缺少列

时间:2015-12-24 03:22:29

标签: r data.table

我正在尝试在data.table

中获取最高频率的单词

data.table:dtable4G

key              freq     value
================================
thanks for the   612      support
thanks for the   380      drink
thanks for the   215      payment
thanks for the    27      encouragement
have a great     154      day
have a great     132      weekend
have a great      54      week  
have a great      42      time
have a great      19      night
at the same      346      time
at the same       57      damn
at the same       30      pace
at the same       11      speed
at the same        7      level
at the same        1      rate 

我尝试了代码

dtable4G[ , max(freq), by = key] 

dtable4G[ , .I[which.max(freq)] , by = key]

以上两个命令,我都得到了相同的结果:

key              V1
====================
thanks for the   612
have a great     154
at the same      346

我希望结果是:

key              freq     value
================================
thanks for the   612      support
have a great     154      day
at the same      346      time

任何想法我做错了什么?

EDITED

dtable4G [dtable4G [,.I [which.max(freq)],by = key] $ V1]

为我工作。虽然花了一些时间来完成我的5.4密耳行。

但这比使用

更快
dtable4G[,.SD[which.max(freq)],by=key]

参考:With data.table, is SD[which.max(Var1)] the fastest way to find the max of a group?

1 个答案:

答案 0 :(得分:2)

我们可以使用以下内容仅为每个key列值的最大频率对数据表进行子集化:

dtable4G[,.SD[which.max(freq)],by=key]

为了获得更好的性能,您也可以使用以下方法。它没有构建.SD,因此更快:

dtable4g[dtable4g[, .I[which.max(freq)], by = key]$V1]