我有一个投资组合资产数据集:
# Input test data
portolios <- structure(list(portfolioid = c(1L, 1L, 1L, 1L, 1L, 1L), secid = c("A", "B", "A", "C", "C", "A"), reportdate = c("2010-03-31", "2010-03-31", "2010-06-30", "2010-06-30", "2010-07-15", "2010-08-31"), report_type = c("Full", "Full", "Full", "Full", "Partial", "Full"), shares = c(100L, 100L, 130L, 50L, 75L, 80L)), .Names = c("portfolioid", "secid", "reportdate", "report_type", "shares"), row.names = c(NA, -6L), class = c("data.table", "data.frame"))
portfolioid secid reportdate report_type shares
1: 1 A 2010-03-31 Full 100
2: 1 B 2010-03-31 Full 100
3: 1 A 2010-06-30 Full 130
4: 1 C 2010-06-30 Full 50
5: 1 C 2010-07-15 Partial 75
6: 1 A 2010-08-31 Full 80
我需要归咎于以下缺失的记录:
7: 1 B 2010-06-30 Full 0
8: 1 C 2010-08-31 Full 0
业务问题是有时不会报告完整report_type的职位销售(份额= 0),因此必须根据先前的报告估算缺少的SecID。
最终,我正在寻求从每个portfolioID的先前报告中计算每个SecID的份额变化,以便我的数据集看起来像这样:
changes <- structure(list(portfolioid = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), secid = c("A", "B", "A", "B", "C", "C", "A", "C"), reportdate = c("2010-03-31", "2010-03-31", "2010-06-30", "2010-06-30", "2010-06-30", "2010-07-15", "2010-08-31", "2010-08-31"), report_type = c("Full", "Full", "Full", "Full", "Full", "Partial", "Full", "Full"), shares = c(100L, 100L, 130L, 0L, 50L, 75L, 80L, 0L), change = c(100L, 100L, 30L, -100L, 50L, 25L, -50L, -75L)), .Names = c("portfolioid", "secid", "reportdate", "report_type", "shares", "change"), row.names = c(NA, -8L), class = c("data.table", "data.frame"))
portfolioid secid reportdate report_type shares change
1: 1 A 2010-03-31 Full 100 100
2: 1 B 2010-03-31 Full 100 100
3: 1 A 2010-06-30 Full 130 30
4: 1 B 2010-06-30 Full 0 -100
5: 1 C 2010-06-30 Full 50 50
6: 1 C 2010-07-15 Partial 75 25
7: 1 A 2010-08-31 Full 80 -50
8: 1 C 2010-08-31 Full 0 -75
我坚持如何为外部联接投资组合创建 i [i]。我的问题是我不想使用i <- CJ(reportdate, secid)
,因为它会产生太多不必要的记录,因为并非每个ReportD都存在每个secid而且没有正确表示需要填充的数据。
我想我需要在reportdate,reportdate[-1,secid]
我想向前滚动secid并设置共享:= 0当完整报告中缺少secid但它在先前报告中存在(部分或完整)。我相信我会选择 roll = 1 ,但我不确定在何处或如何实施。
我认为我的问题类似于
How to Calculate a rolling statistic in R using data.table on unevenly spaced data
我确信我错过了一些基本的理解或CJ()技巧,可以创建必要的 i
答案 0 :(得分:1)
这样的事情应该有效(如果我理解的话)
首先使用日期类设置reportdate
。也获得独特的日期
portolios[, reportdate := as.IDate(reportdate)]
uniq.dts <- unique(portolios$reportdate)
uniq.dts <- uniq.dts[order(uniq.dts)]
为每个i执行自联接并仅提取在我们知道secid已经存在之后发生的日期(应该比CJ更高的内存效率)
setkey(portolios,secid)
setorder(portolios,sec,id,reportdate)
impute <- portolios[portolios, {
tmp = max(reportdate) < uniq.dts;
list(portfolioid=1,reportdate=uniq.dts[tmp][1],report_type="Full",shares=0)
},by=.EACHI][!is.na(reportdate)][,.SD[1],by=secid]
接下来,rbindlist
原始表和impute
表。
portolios <- rbindlist(list(portolios,impute),fill=TRUE)
#Order data by secid and reportdate
portolios <- portolios[order(secid,reportdate)]
#Lag data by group
portolios[, prev.shares := c(NA,lag(shares)), by=secid]
#Calculate change WHEN a previous share amount exists
portolios[, change := ifelse(is.na(prev.shares),shares,shares-prev.shares), by=secid]
print(portolios[order(reportdate)])
portfolioid secid reportdate report_type shares prev.shares change
1: 1 A 2010-03-31 Full 100 NA 100
2: 1 B 2010-03-31 Full 100 NA 100
3: 1 A 2010-06-30 Full 130 100 30
4: 1 B 2010-06-30 Full 0 100 -100
5: 1 C 2010-06-30 Full 50 NA 50
6: 1 C 2010-07-15 Partial 75 50 25
7: 1 A 2010-08-31 Full 80 130 -50
8: 1 C 2010-08-31 Full 0 75 -75