我正在尝试计算列z中每个唯一字符串值的副本数量,其中包含data.table中的另外两列(x,y)(使用data.table包或类似的快速,我有数百万实际行来运行它:)
我有这样的数据:
dt <- data.table(x=c("aa","aa","aa","bb","cc","cc","cc","cc","cc","cc"), y=c(2,2,1,1,1,1,2,2,2,3),z=c("d","d","a","d","a","a","e","e","b", "a"))
x y z
1: aa 2 d
2: aa 2 d
3: aa 1 a
4: bb 1 d
5: cc 1 a
6: cc 1 a
7: cc 2 e
8: cc 2 e
9: cc 2 b
10: cc 3 a
我想这样:
dt.desired <- data.table(x=c("aa","aa", "bb","cc", "cc","cc", "cc"), y=c(1,2,1,1,2,2,3), z=c("a","d","d","a","b","e","a"), n=c(1,2,1,2,1,2,1))
x y z n
1: aa 1 a 1
2: aa 2 d 2
3: bb 1 d 1
4: cc 1 a 2
5: cc 2 b 1
6: cc 2 e 2
7: cc 3 a 1
答案 0 :(得分:-1)
您可以使用dplyr
中的magrittr
和tidyverse
执行此操作:
library(data.table)
library(tidyverse)
> dt %>% count(x,y,z)
# A tibble: 7 x 4
x y z n
<chr> <dbl> <chr> <int>
1 aa 1. a 1
2 aa 2. d 2
3 bb 1. d 1
4 cc 1. a 2
5 cc 2. b 1
6 cc 2. e 2
7 cc 3. a 1
如果要创建新数据框,只需指定一个变量,如
z <- dt %>% count(x,y,z)