我搜索了如何在R中复制索引匹配并找到了适用于较小数据集的解决方案,但不适用于两个数据框,其中包括(a)5年多次掉期利率的每日利率历史以及( b)具有200,000多条记录的个人贷款细节。我通过索引(匹配)操作excel中的结果,但想学习在R中执行此操作的有效方法。
简而言之:
ifelse(is.na(a + dplyr::lag(a)), a, NA)
#[1] 3 NA NA NA NA 1 NA NA 2 NA NA NA
我希望匹配/加入数据,以便只将特定原始日期的费率从适当的索引输入到LoanData中的新列。我再次尝试使用spread()/ gather()从另一篇文章中提出建议,并且适用于这个小例子但是对我的完整数据集使用了太多内存。
我在下面试过,但给了我一个矩阵结果
RateData <- data.frame(Date = c("2018-01-01","2018-01-05","2018-01-08","2018-01-17"),
Threeyr = c(1.25,1.27,1.29, 1.30),
Fiveyr = c(2.3,2.31,2.34, 2.4),
Tenyr = c(2.8,2.89,2.75, 2.6),
PRIME = c(4.0,4,4, 4.25))
LoanData <- data.frame(OriginationDate = c("2018-01-01","2018-01-01","2018-01-01","2018-01-05",
"2018-01-08","2018-01-08","2018-01-17"),
LNTYPE = c(83,101,115,83,83,105,115),
PriceIndex = c('Threeyr','Fiveyr','PRIME','Threeyr','Threeyr','Fiveyr','PRIME'))
如何获得所需的结果:
LoanData$Rate <- RateData[match(LoanData$OriginationDate,RateData$Date),
match(LoanData$PriceIndex,colnames(RateData))]
非常感谢任何帮助!
答案 0 :(得分:1)
你非常接近,只是你必须使用矩阵在多个维度下标,否则你会得到组合的产物。正确的是:
LoanData$Rate <- RateData[cbind(
match(LoanData$OriginationDate, RateData$Date),
match(LoanData$PriceIndex, colnames(RateData))
)]
答案 1 :(得分:0)
您可以使用reshape2库:
library(reshape2)
x00=melt(RateData)
LoanData$Rate=sapply(1:nrow(LoanData),function(x) filter(x00,Date==LoanData[x,1]&variable==LoanData[x,3])$value)
答案 2 :(得分:0)
这是一个试验:
library(data.table)
#Find all the names in Ratedata that match the Priceindex per date
A=setDT(LoanData)[,.(names(RateData[-1])%in%PriceIndex,1:4),keyby=OriginationDate]
#Rearrange the data,then only take the Rates that you need
B=dcast(A,OriginationDate~V2,value.var = "V1")[-1]*RateData[-1]
#Reshape it to the required order after removing the rows whereby the rate is 0
data.frame(na.omit(`is.na<-`(C<-melt(cbind(RateData[1],B),1),C==0)),LoanData$LNTYPE)
Date variable value LoanData.LNTYPE
1 2018-01-01 1 1.25 83
2 2018-01-05 1 1.27 101
3 2018-01-08 1 1.29 115
5 2018-01-01 2 2.30 83
7 2018-01-08 2 2.34 83
13 2018-01-01 4 4.00 105
16 2018-01-17 4 4.25 115