我需要从数据框创建库存数据。数据框包含开始日期和结束日期,表示项目库存的期间。我想汇总每个项目的库存水平,然后创建一个包含数据的时间序列。
我有以下表格的数据:
A <- c("a","b","a","b","c")
begindate <- as.Date(c("2014-01-01", "2014-01-03", "2014-01-03", "2014-01-02", "2014-01-02"))
enddate <- as.Date(c("2014-01-04", "2014-01-05", "2014-01-06", "2014-01-04", "2014-01-06"))
source <- data.frame(A, begindate, enddate)
source
A begindate enddate
1 a 2014-01-01 2014-01-04
2 b 2014-01-03 2014-01-05
3 a 2014-01-03 2014-01-06
4 b 2014-01-02 2014-01-04
5 c 2014-01-02 2014-01-06
我想从这些数据中创建的是像
这样的时间序列 2014-01-01 2014-01-02 2014-01-03 2014-01-04 2014-01-05 2014-01-06
a 1 1 2 2 1 1
b 1 2 2 1
c 1 1 1 1 1
原始数据相当大,大约180k行。什么是有效的方法呢?
修改
David Arenburg给出的答案运作良好
library(data.table)
library(reshape2)
setDT(mydata)[, indx := .I]
mydata <- mydata[, list(A = A, seq(begindate, enddate, by = 1)), by = indx]
但是对于我的数据来说这很慢。添加中间步骤显着加快了铸造操作。
# intermediate step (pre-aggregation)
mydata_aggregated <- mydata[, list(number_cases = length(indx)), by = list(A, V2)]
# casting over the aggregated list
mydata_series <- dcast(mydata_aggregated, V2 ~ A, value.var = "number_cases") # note
# that I switched the rows and columns, since I found that its easier to pass this
# data to zoo or xts
# creating the zoo object
mydata_zoo <- zoo(mydata_series[,-1],mydata_series[,1])
答案 0 :(得分:1)
如果您的数据集很大,我会使用data.table
library(data.table)
library(reshape2)
setDT(source)[, indx := .I]
source <- source[, list(A = A, seq.int(begindate, enddate, by = 1)), by = indx]
dcast.data.table(source, A ~ V2, value.var = "V2", length)
## A 2014-01-01 2014-01-02 2014-01-03 2014-01-04 2014-01-05 2014-01-06
##1 a 1 1 2 2 1 1
##2 b 0 1 2 2 1 0
##3 c 0 1 1 1 1 1
只是旁注,source
是R中的存储函数,因此尝试使用其他名称作为数据集