我创建了一个名为DT的虚拟数据表。我正在尝试计算容量(数字)的总和,计算每个ID中代码和状态(分类)的频率。为了最终结果,我想在每个唯一ID中显示容量,A,B,C ......的频率和不同状态的总和。因此,列名称将为ID,total.Cap,A,B,C ... AZ,CA ..
DT <- data.table(ID = rep(1:500,100),
Capacity = sample(1:1000, size = 50000, replace =T),
Code = sample(LETTERS[1:26], 50000, replace = T),
State = rep(c("AZ","CA","PA","NY","WA","SD"), 50000))
The format of result will like the table below:
ID total.Cap A B C ... AZ CA ...
1 28123 10 25 70 ... 29 ...
2 32182 20 42 50 ... 30 ...
3
我试图使用ddply,melt和dcast ..但结果并没有像我想的那样出现。任何人都可以给我一些关于如何构建表格的提示吗?谢谢!
答案 0 :(得分:1)
您可以使用三个单独的data.table语句构建总计,状态计数和代码计数,然后加入它们。在状态和代码上,您可以使用dcast
将每个状态/代码转换为一列,并在每个状态/代码中包含计数。
library(data.table)
totals <- DT[, list(total.Cap = sum(Capacity)), by = "ID"]
states <- dcast(DT, ID ~ State)
codes <- dcast(DT, ID ~ Code)
然后,您可以将三个表连接在一起:
result <- setkey(totals, "ID")[states, ][codes, ]
这导致表格如下:
ID total.Cap AZ CA NY PA SD WA A B C D E F G H I J K L M N O P Q R S T U
1: 1 287526 200 0 0 200 0 200 12 18 24 42 12 30 30 18 6 36 24 6 18 24 30 24 6 24 36 18 30
2: 2 293838 0 200 200 0 200 0 18 24 42 30 30 12 24 6 24 12 48 42 18 18 42 24 24 24 12 18 24
3: 3 279450 200 0 0 200 0 200 24 18 24 6 12 12 18 12 12 30 24 18 54 30 6 42 18 30 24 24 18
4: 4 298200 0 200 200 0 200 0 30 30 36 30 36 24 24 18 24 18 30 30 30 24 6 30 18 6 18 18 18
5: 5 294084 200 0 0 200 0 200 18 6 24 12 42 12 18 42 18 18 18 18 24 24 30 18 30 24 6 30 24
请注意,如果你有很多像State和Code这样的列,你可以先把它们熔化一下就可以完成所有这些:
# replace State and Code with the categorical variables you want
melted <- melt(DT, measure.vars = c("State", "Code"))
state_codes <- dcast(melted, ID ~ value)
setkey(totals, "ID")[state_codes, ]
请注意,您仍然需要加入总计,并且这不会保留列的顺序,例如&#34;状态然后代码&#34;反之亦然。
答案 1 :(得分:0)
这会在三个单独的数据表中创建total.Cap
,Code
和State
摘要元素,然后按ID
合并它们:
# Storing intermediate pieces
total_cap <- DT[, j = list(total.Cap = sum(Capacity)), by = ID]
code <- dcast(DT[, .N, by = c("ID", "Code")], ID ~ Code, fill = 0)
state <- dcast(DT[, .N, by = c("ID", "State")], ID ~ State, fill = 0)
mytable <- merge(total_cap, code, by = "ID")
mytable <- merge(mytable, state, by = "ID")
mytable
# As a one-liner
mytable <- merge(
merge(DT[, j = list(total.Cap = sum(Capacity)), by = ID],
dcast(DT[, .N, by = c("ID", "Code")], ID ~ Code, fill = 0),
by = "ID"),
dcast(DT[, .N, by = c("ID", "State")], ID ~ State, fill = 0),
by = "ID")
mytable