我在R中有以下数据表:
set.seed(5)
my_data <- data.table(cat_1=c(1,1,1,2,2,1,1,1,3,4,5,4,5),
cat_2 = sample(c("A","B"), 13, replace = T),
ao = rnorm(13,500,10))
我想知道每个cat_1的行数,每个cat_1的总和,以及每个cat_1的cat_2的数量。理想情况下,我想得到这个:
merge(my_data[, .(cat1_lines = .N, total_ao = sum(ao, na.rm = T)), by = cat_1],
my_data[cat_2 == "A", .(A_lines = .N), by = cat_1], by = "cat_1", all.x = T)
cat_1 cat1_lines total_ao A_lines
1: 1 6 3015.5034 1
2: 2 2 1015.8838 2
3: 3 1 516.9518 NA
4: 4 2 984.0768 2
5: 5 2 983.8361 2
有没有办法在同一个语句中执行此操作而无需合并?像(我知道这不起作用):
my_data[, .(cat1_lines = .N, A_lines = .N[cat_2 == "A"],
total_ao = sum(ao, na.rm = T)), by = cat_1]
答案 0 :(得分:4)
您可以使用by
中的data.table
声明轻松完成此操作。试试这个:
my_data[,.(cat1_lines=.N,total_ao=sum(ao),A_lines=sum(cat_2=="A")),by=.(cat_1)]
cat_1 cat1_lines total_ao A_lines
1: 1 6 3015.5034 1
2: 2 2 1015.8838 2
3: 3 1 516.9518 0
4: 4 2 984.0768 2
5: 5 2 983.8361 2