我如何在i中使用data.table的.N?

时间:2019-07-03 19:25:17

标签: r data.table

help page for special-symbols in data.table中,它表示“ .N也可以在i中使用。”我该怎么办?

例如,我希望以下代码仅保留组中只有一个元素的行。

> library(data.table)
> set.seed(734)
> dt <- data.table(x = c(rep("a", 5), rep("b", 3), "c", "d", "e"),
                   y = runif(11))
> dt
    x          y
 1: a 0.46431448
 2: a 0.57148294
 3: a 0.30197960
 4: a 0.06394102
 5: a 0.08793526
 6: b 0.62994539
 7: b 0.64693916
 8: b 0.79671939
 9: c 0.60865117
10: d 0.86025196
11: e 0.21562992

> dt[.N == 1, .(y), by = .(x)]
Empty data.table (0 rows) of 2 cols: x,y

我希望它的结果与:

> dt[, .(n = .N, y = y), by = .(x)][n == 1, .(x, y)]
   x         y
1: c 0.6086512
2: d 0.8602520
3: e 0.2156299

如果不喜欢上面的示例,我该如何在.N中将i中的data.table用于.grid { display: grid; grid-template-rows: 1fr 1fr 1fr; grid-template-columns: 1fr 1fr 1fr; grid-auto-flow: column; width: 300px; } .item { border: 1px solid red; padding: 15px; } .stretch { grid-column: span 20; }

1 个答案:

答案 0 :(得分:0)

.N中未使用基于i的逻辑表达式。相反,请从.I中的表达式获取行索引(j),提取($V1)索引并对行进行子集

dt[dt[, .I[.N == 1], by = .(x)]$V1]
#   x         y
#1: c 0.6086512
#2: d 0.8602520
#3: e 0.2156299

此外,该表达式可用于对.SD进行子集化(可能很慢)

dt[, .SD[.N == 1], .(x)]

关于?.N的用法,

  

.SD,.BY,.N,.I和.GRP是在j中使用的只读符号。 .N也可以在i中使用。

但是,它没有提到什么背景。如果我们仅使用i表达式

dt[.N > 2] # works

或者使用ij

dt[.N > 2, .(x)]

要了解如何调用函数,请使用verbose = TRUE

dt[.N ==1, .SD, by = .(x), verbose = TRUE]
#i clause present and columns used in by detected, only these subset: x 
#lapply optimization changed j from '.SD' to 'list(y)'
#Old mean optimization is on, left j unchanged.
#Making each group and running j (GForce FALSE) ... 
#  memcpy contiguous groups took 0.000s for 1 groups
#  eval(j) took 0.000s for 1 calls
#0.046s elapsed (0.268s cpu) 
#Empty data.table (0 rows and 2 cols): x,y

dt[dt[, .I[.N == 1], by = .(x), verbose = TRUE]$V1]
#Detected that j uses these columns: <none> 
#Finding groups using forderv ... 0.032s elapsed (0.033s cpu) 
#Finding group sizes from the positions (can be avoided to save RAM) ... 0.033s #elapsed (0.194s cpu) 
#lapply optimization is on, j unchanged as '.I[.N == 1]'
#GForce is on, left j unchanged
#Old mean optimization is on, left j unchanged.
#Making each group and running j (GForce FALSE) ... dogroups: growing from 0 to #2 rows
#dogroups: growing from 2 to 4 rows
#Wrote less rows (3) than allocated (4).

#  memcpy contiguous groups took 0.000s for 5 groups
#  eval(j) took 0.000s for 5 calls
0.046s elapsed (0.273s cpu)