我有一个data.table qq
。以下是dput
structure(list(Sepal.Length = c(NA, NA, 4.7, NA, 5), Sepal.Width = c(3.5,
NA, 3.2, 3.1, 3.6), Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4),
Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2), Species = c("a",
"a", "b", "b", "b")), .Names = c("Sepal.Length", "Sepal.Width",
"Petal.Length", "Petal.Width", "Species"), row.names = c(NA,
-5L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000011c90788>)
> qq
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1: NA 3.5 1.4 0.2 a
2: NA NA 1.4 0.2 a
3: 4.7 3.2 1.3 0.2 b
4: NA 3.1 1.5 0.2 b
5: 5.0 3.6 1.4 0.2 b
我想获取列Species
的每个类别的行数,以使列Sepal.Length
不是NA。最终结果如下(请参阅新专栏no_rows
)
> qq
Sepal.Length Sepal.Width Petal.Length Petal.Width Species no_rows
1: NA 3.5 1.4 0.2 a 0
2: NA NA 1.4 0.2 a 0
3: 4.7 3.2 1.3 0.2 b 2
4: NA 3.1 1.5 0.2 b 2
5: 5.0 3.6 1.4 0.2 b 2
我做了以下
qq[, no_rows:= nrow(qq[is.na(Sepal.Length) == F, ]), .(Species)]
但它不起作用。有人可以给我一个有效的数据解决方案吗?
答案 0 :(得分:4)
我们按照物种&#39;分组,获取逻辑索引sum
(!is.na(Sepal.Length)
)并分配(:=
)输出以创建&#39; no_rows&#39;
qq[, no_rows := sum(!is.na(Sepal.Length)) , by = Species]
qq
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species no_rows
#1: NA 3.5 1.4 0.2 a 0
#2: NA NA 1.4 0.2 a 0
#3: 4.7 3.2 1.3 0.2 b 2
#4: NA 3.1 1.5 0.2 b 2
#5: 5.0 3.6 1.4 0.2 b 2