我有一个包含多个风险投资,基金,业绩,年份和基金规模的数据集。 每个基金都有一个表演。为了测试是否存在性能持久性,我想找到前一个/前一个基金(“PreviousPerformance”)的性能。 PreviousPerformance是同一VC的先前FUND的性能。因此,国际年确定哪些资金是先开,然后是第二,等等。
VC FUND Performance Year FundSize
A Partners A 0.30 2005 1
B Capital B5 0.20 2008 2
B Capital B4 0.10 2003 3
B Capital B3 0.25 2001 4
B Capital B2 0.20 2001 5
B Capital B1 0.10 2000 6
例如:基金“B5”的PreviousPerformance为0.1,这是B4的表现。
有时,目前尚不清楚哪个基金是最后一只基金。例如,B4有两个先前的基金,B2和B3,它们都是在2001年成立的。在这种情况下,我希望PreviousPerformance是具有最大FundSize的FUND的表现(对于B4,这是B5)。如果基金没有前任,则PreviousPerformance =“ - ”
最后,数据集看起来应该是这样的。
VC FUND Performance Year FundSize PreviousPerformance
A Partners A 0.30 2005 1 -
B Capital B5 0.20 2008 2 0.1
B Capital B4 0.10 2003 3 0.2
B Capital B3 0.25 2001 4 0.1
B Capital B2 0.20 2001 5 0.1
B Capital B1 0.10 2000 6 -
我不知道哪种功能可用于此类问题,是否有人有建议?
FundPerformance = data.table(VC = c("A Partners", rep("B Capital",5)),
FUND = c("A","B5","B4","B3","B2","B1"),
Performance = c(0.3,0.2,0.1,0.25,0.2,0.1),
Year= c(2005,2008,2003,2001,2001,2000),
FundSize=c(1:6),
PreviousPerformance = c ("-",0.1,0.2,0.1,0.1,"-"))
答案 0 :(得分:0)
我会做以下事情。但我不是一个频繁的data.table
用户,所以其他人可能会有一个更清洁的解决方案。
# set up data
require(data.table)
dt <- data.table(VC = c("A Partners", rep("B Capital",5)),
Fund = c("A","B5","B4","B3","B2","B1"),
Performance = c(0.3,0.2,0.1,0.25,0.2,0.1),
Year= c(2005,2008,2003,2001,2001,2000),
FundSize=c(1:6),
PreviousPerformance = c (NA,0.1,0.2,0.1,0.1,NA))
setkey(dt, VC, Year)
# find performance for largest fund within each VC each year
dt[, PreviousPerformanceII:=Performance[FundSize==max(FundSize)], keyby=key(dt)]
dtUnique <- unique(dt, keyby=key(dt))
dtUnique <- dtUnique[, list(VC, Year, PreviousPerformanceII)]
# rolling join to pick up last year's performance
dtUnique[, Year:=Year+1]
setkey(dtUnique, VC, Year)
dtNew <- dtUnique[dt, roll=TRUE]
# clean up data
dtNew$i.PreviousPerformanceII <- NULL
setkey(dtNew, VC, Year)
dtNew