我正在尝试在data.table
中获取最高频率的单词data.table:dtable4G
key freq value
================================
thanks for the 612 support
thanks for the 380 drink
thanks for the 215 payment
thanks for the 27 encouragement
have a great 154 day
have a great 132 weekend
have a great 54 week
have a great 42 time
have a great 19 night
at the same 346 time
at the same 57 damn
at the same 30 pace
at the same 11 speed
at the same 7 level
at the same 1 rate
我尝试了代码
dtable4G[ , max(freq), by = key]
和
dtable4G[ , .I[which.max(freq)] , by = key]
以上两个命令,我都得到了相同的结果:
key V1
====================
thanks for the 612
have a great 154
at the same 346
我希望结果是:
key freq value
================================
thanks for the 612 support
have a great 154 day
at the same 346 time
任何想法我做错了什么?
EDITED
dtable4G [dtable4G [,.I [which.max(freq)],by = key] $ V1]
为我工作。虽然花了一些时间来完成我的5.4密耳行。
但这比使用
更快dtable4G[,.SD[which.max(freq)],by=key]
参考:With data.table, is SD[which.max(Var1)] the fastest way to find the max of a group?
答案 0 :(得分:2)
我们可以使用以下内容仅为每个key
列值的最大频率对数据表进行子集化:
dtable4G[,.SD[which.max(freq)],by=key]
为了获得更好的性能,您也可以使用以下方法。它没有构建.SD,因此更快:
dtable4g[dtable4g[, .I[which.max(freq)], by = key]$V1]