通过R中的两列计算data.table中重复项的数量

时间:2018-06-16 21:47:00

标签: r group-by count data.table

我正在尝试计算列z中每个唯一字符串值的副本数量,其中包含data.table中的另外两列(x,y)(使用data.table包或类似的快速,我有数百万实际行来运行它:)

我有这样的数据:

dt <- data.table(x=c("aa","aa","aa","bb","cc","cc","cc","cc","cc","cc"), y=c(2,2,1,1,1,1,2,2,2,3),z=c("d","d","a","d","a","a","e","e","b", "a")) 

     x y z
 1: aa 2 d
 2: aa 2 d
 3: aa 1 a
 4: bb 1 d
 5: cc 1 a
 6: cc 1 a
 7: cc 2 e
 8: cc 2 e
 9: cc 2 b
10: cc 3 a

我想这样:

dt.desired <- data.table(x=c("aa","aa", "bb","cc", "cc","cc", "cc"), y=c(1,2,1,1,2,2,3), z=c("a","d","d","a","b","e","a"), n=c(1,2,1,2,1,2,1))


    x y z n
1: aa 1 a 1
2: aa 2 d 2
3: bb 1 d 1
4: cc 1 a 2
5: cc 2 b 1
6: cc 2 e 2
7: cc 3 a 1

1 个答案:

答案 0 :(得分:-1)

您可以使用dplyr中的magrittrtidyverse执行此操作:

library(data.table)
library(tidyverse)

> dt %>% count(x,y,z)
# A tibble: 7 x 4
  x         y z         n
  <chr> <dbl> <chr> <int>
1 aa       1. a         1
2 aa       2. d         2
3 bb       1. d         1
4 cc       1. a         2
5 cc       2. b         1
6 cc       2. e         2
7 cc       3. a         1

如果要创建新数据框,只需指定一个变量,如

z <- dt %>% count(x,y,z)