数据表在j

时间:2016-10-13 14:21:57

标签: r data.table

我在R中有以下数据表:

set.seed(5)
my_data <- data.table(cat_1=c(1,1,1,2,2,1,1,1,3,4,5,4,5), 
                      cat_2 = sample(c("A","B"), 13, replace = T),
                      ao = rnorm(13,500,10))

我想知道每个cat_1的行数,每个cat_1的总和,以及每个cat_1的cat_2的数量。理想情况下,我想得到这个:

merge(my_data[, .(cat1_lines = .N, total_ao = sum(ao, na.rm = T)), by = cat_1], 
               my_data[cat_2 == "A", .(A_lines = .N), by = cat_1], by = "cat_1", all.x = T)

    cat_1 cat1_lines  total_ao A_lines
1:     1          6 3015.5034       1
2:     2          2 1015.8838       2
3:     3          1  516.9518      NA
4:     4          2  984.0768       2
5:     5          2  983.8361       2

有没有办法在同一个语句中执行此操作而无需合并?像(我知道这不起作用):

my_data[, .(cat1_lines = .N, A_lines = .N[cat_2 == "A"], 
                    total_ao = sum(ao, na.rm = T)), by = cat_1]

1 个答案:

答案 0 :(得分:4)

您可以使用by中的data.table声明轻松完成此操作。试试这个:

my_data[,.(cat1_lines=.N,total_ao=sum(ao),A_lines=sum(cat_2=="A")),by=.(cat_1)]

   cat_1 cat1_lines  total_ao A_lines
1:     1          6 3015.5034       1
2:     2          2 1015.8838       2
3:     3          1  516.9518       0
4:     4          2  984.0768       2
5:     5          2  983.8361       2