我的数据结构如下:
DT <- data.table(Id = c(1, 1, 1, 1, 10, 100, 100, 101, 101, 101),
Date = as.Date(c("1997-01-01", "1997-01-02", "1997-01-03", "1997-01-04",
"1997-01-02", "1997-01-02", "1997-01-04", "1997-01-03",
"1997-01-04", "1997-01-04")),
group = c(1,1,1,1,1,2,2,2,2,2),
Price.1 = c(29, 25, 14, 26, 30, 16, 13, 62, 12, 6),
Price.2 = c(4, 5, 6, 6, 8, 2, 3, 5, 7, 8))
>DT
Id Date group Price.1 Price.2
1: 1 1997-01-01 1 29 4
2: 1 1997-01-02 1 25 5
3: 1 1997-01-03 1 14 6
4: 1 1997-01-04 1 26 6
5: 10 1997-01-02 1 30 8
6: 100 1997-01-02 2 16 2
7: 100 1997-01-04 2 13 3
8: 101 1997-01-03 2 62 5
9: 101 1997-01-04 2 12 7
10: 101 1997-01-04 2 6 8
我正在尝试投射它(使用dcast.data.table):
dcast.data.table(DT, Id ~ Date, fun = sum, value.var = "Price.1")
dcast.data.table(DT, Id ~ group, fun = sum, value.var = "Price.1")
dcast.data.table(DT, Id ~ Date, fun = sum, value.var = "Price.2")
dcast.data.table(DT, Id ~ group, fun = sum, value.var = "Price.2")
但是我试图获得以下内容而不是4个单独的输出:
Id 1997-01-01 1997-01-02 1997-01-03 1997-01-04 1 2 Price
1: 1 29 25 14 26 94 0 Price.1
2: 10 0 30 0 0 30 0 Price.1
3: 100 0 16 0 13 0 29 Price.1
4: 101 0 0 62 18 0 80 Price.1
5: 1 4 5 6 6 21 0 Price.2
6: 10 0 8 0 0 8 0 Price.2
7: 100 0 2 0 3 0 5 Price.2
8: 101 0 0 5 15 0 20 Price.2
我的解决方法是使用rbind,cbind和merge。
cbind(rbind(merge(dcast.data.table(DT, Id ~ Date, fun = sum, value.var = "Price.1"),
dcast.data.table(DT, Id ~ group, fun = sum, value.var = "Price.1"), by = "Id", all.x = T),
merge(dcast.data.table(DT, Id ~ Date, fun = sum, value.var = "Price.2"),
dcast.data.table(DT, Id ~ group, fun = sum, value.var = "Price.2"), by = "Id", all.x = T)),
Price = c("Price.1","Price.1","Price.1","Price.1","Price.2","Price.2","Price.2","Price.2"))
有没有现成的,更清洁的方法呢?
答案 0 :(得分:5)
我假设每个Id
映射到一个唯一的group
并删除该变量,否则这与@ user227710的答案基本相同。
Idg <- unique(DT[,.(Id,group)])
DT[,group:=NULL]
res <- dcast(
melt(DT, id.vars = c("Id","Date")),
variable+Id ~ Date,
value.var = "value",
fill = 0,
margins = "Date",
fun.aggregate = sum
)
# and if you want the group back...
setDT(res) # needed before data.table 1.9.5, where using dcast.data.table is another option
setkey(res,Id)
res[Idg][order(variable,Id)]
给出了
variable Id 1997-01-01 1997-01-02 1997-01-03 1997-01-04 (all) group
1: Price.1 1 29 25 14 26 94 1
2: Price.2 1 4 5 6 6 21 1
3: Price.1 10 0 30 0 0 30 1
4: Price.2 10 0 8 0 0 8 1
5: Price.1 100 0 16 0 13 29 2
6: Price.2 100 0 2 0 3 5 2
7: Price.1 101 0 0 62 18 80 2
8: Price.2 101 0 0 5 15 20 2
答案 1 :(得分:4)
这真是一个反复试验:我希望它有效。
library(data.table) #version 1.9.4
library(reshape2)
kk <- melt(DT,id.vars=c("Id","Date","group"),
measure.vars = c("Price.1","Price.2"),
value.name = "Price")
dcast(kk, Id + variable + group ~ Date, value.var = "Price", fun = sum,margins="Date")
# ^ use of margins borrowed from @Frank.
# Id variable group 1997-01-01 1997-01-02 1997-01-03 1997-01-04 (all)
# 1 1 Price.1 1 29 25 14 26 94
# 2 1 Price.2 1 4 5 6 6 21
# 3 10 Price.1 1 0 30 0 0 30
# 4 10 Price.2 1 0 8 0 0 8
# 5 100 Price.1 2 0 16 0 13 29
# 6 100 Price.2 2 0 2 0 3 5
# 7 101 Price.1 2 0 0 62 18 80
# 8 101 Price.2 2 0 0 5 15 20
答案 2 :(得分:0)
只是为了比较,import sqlite3
keys = ["MAKE", "MODEL", "COST"]
with sqlite3.connect("test.db") as conn:
c = conn.cursor()
for car in entries['CARS']:
for key in keys:
car.setdefault(key, None)
c.execute('INSERT INTO cars VALUES(?,?,?)', (car['MAKE'], car['MODEL'], car['COST']))
中的解决方案(因为我还没有学会如何让我的大脑正确地融化东西。)
dplyr