用于术语结构估计的R包“termstrc”是一个非常有用的工具,但它需要以特别笨拙的格式设置数据:列表中的列表。
问题:为了创建运行“dyncouponbonds”函数所需的重复子列表格式,在R之外或R之内准备和整形数据的最佳方法是什么?
“dyncouponbonds”命令要求在重复的子列表中设置数据,其中一个债券列表和这些债券的时间不变特征(让我们称之为“债券列表”)附加了这些债券的一些时间特征(价格和应计利息),并复制时间t + 1到T.
以下是一个期间的列表格式示例。 “dyncouponbonds”命令要求在伞形列表中为所有T周期复制此格式。 ISIN,MATURITYDATE,ISSUEDATE,COUPONRATE在每个时期都是相同的。每个时期的价格,ACCRUED,CASHFLOWS和TODAY都会有所不同。
R> str(govbonds$GERMANY)
List of 8
$ ISIN : chr [1:52] "DE0001141414" "DE0001137131" "DE0001141422" ...
$ MATURITYDATE:Class 'Date' num [1:52] 13924 13952 13980 14043 ...
$ ISSUEDATE :Class 'Date' num [1:52] 11913 13215 12153 13298 ...
$ COUPONRATE : num [1:52] 0.0425 0.03 0.03 0.0325 ...
$ PRICE : num [1:52] 100 99.9 99.8 99.8 ...
$ ACCRUED : num [1:52] 4.09 2.66 2.43 2.07 ...
$ CASHFLOWS :List of 3
..$ ISIN: chr [1:384] "DE0001141414" "DE0001137131" "DE0001141422" ...
..$ CF : num [1:384] 104 103 103 103 ...
..$ DATE:Class 'Date' num [1:384] 13924 13952 13980 14043 ...
$ TODAY :Class 'Date' num 13908
答案 0 :(得分:4)
这是一个相当高级的数据操作问题。 R有许多强大的数据处理工具,你不需要离开R来准备(无可否认的是相当迟钝的)dyncouponbonds对象。事实上,你实际上不应该这样做,因为从另一种语言中获取结构,然后变成dyncouponbonds只会是更多的工作。
我要确定的第一件事是你非常熟悉lapply函数。你将会充分利用它。你将用它来创建一个couponbonds对象列表,这就是dyncouponbonds实际上是什么。然而,创建优惠券对象有点困难,主要是因为CASHFLOWS子列表需要与债券的ISIN相关的每个现金流以及现金流的日期。为此,您将使用lapply和一些相当高级的下标。子集函数也会派上用场。
这个问题在很大程度上取决于你从哪里获取数据,从Bloomberg中获取数据并非易事,主要是因为你需要使用BDS函数和“DES_CASH_FLOW”字段返回历史记录。每个债券都能获得现金流。我说历史,因为如果你使用dyncouponbonds我假设你会想要进行历史收益率曲线分析。您需要覆盖BDS功能的“SETTLE_DT”字段,使用BDP功能和字段“FIRST_SETTLE_DT”获得的债券价值,这样您就可以从债券开始时获得所有现金流(否则它只会从今天起返回,这对历史分析没有好处)。但我离题了。如果您不使用bloomberg,我不知道您将从何处获取此数据。
然后,您需要获取每个债券的静态数据,即到期日,ISIN,票面利率和发行日期。而且您需要历史价格和应计利息数据。再次使用bloomberg,你将使用BDP函数,你将在下面的代码中看到的字段,以及我已经包装为bbdh的历史数据函数BDH。再假设你是一个bloomberg用户,这里是代码:
bbGetCountry <- function(cCode, up = FALSE) {
# this function is going to get all the data out of bloomberg that we need for a
# country, and update it if ncessary
if (up == TRUE) startDate <- as.Date("2012-01-01") else startDate <- histStartDate
# first get all the curve members for history
wdays <- wdaylist(startDate, Sys.Date()) # create the list of working days from startdate
actives <- lapply(wdays, function(x) {
bds(conn, BBcurveIDs[cCode], "CURVE_MEMBERS", override_fields = "CURVE_DATE",
override_values = format(x, "%Y%m%d"))
})
names(actives) <- wdays
uniqueActives <- unique(unlist(actives)) # there will be puhlenty duplicates. Get rid of them
# now get the unchanging bond data
staticData <- bdp(conn, uniqueActives, bbStaticDataFields)
# now get the cash flowdata
cfData <- lapply(uniqueActives, function(x) {
bds(conn, x, "DES_CASH_FLOW_ADJ", override_fields = "SETTLE_DT",
override_values = format(as.Date(staticData[x, "FIRST_SETTLE_DT"]), "%Y%m%d"))
})
names(cfData) <- uniqueActives
# now for historic data
historicData <- lapply(bbHistoricDataFields, function(x) bbdh(uniqueActives, flds = x, startDate = startDate))
names(historicData) <- bbHistoricDataFields # put the names in otherwise we get a numbered list
allDates <- as.Date(index(historicData$LAST_PRICE)) # all the dates we will find settlement dates for for all bonds. No posix
save(actives, file = paste("data/", cCode, "actives.dat", sep = "")) #save all the files now
save(staticData, file = paste("data/", cCode, "staticData.dat", sep = ""))
save(cfData, file = paste("data/", cCode, "cfData.dat", sep = ""))
save(historicData, file = paste("data/", cCode, "historicData.dat", sep = ""))
#save(settleDates, file = paste("data/", cCode, "settleDates.dat", sep = ""))
assign(paste(cCode, "data", sep = ""), list(actives = actives, staticData = staticData, cfData = cfData, #
historicData = historicData), pos = 1)
}
我上面使用的bbdh函数是Rbbg库的bdh函数的包装,看起来像这样:
bbdh <- function(secs, years = 1, flds = "last_price", startDate = NULL) {
#this function gets secs over years from bloomberg daily data
if(is.null(startDate)) startDate <- Sys.Date() - years * 365.25
if(class(startDate) == "Date") stardDate <- format(startDate, "%Y%m%d") #convert date classes to bb string
if(nchar(startDate) > 8) startDate <- format(as.Date(startDate), "%Y%m%d") # if we've been passed wrong format character string
rawd <- bdh(conn, secs, flds, startDate, always.display.tickers = TRUE, include.non.trading.days = TRUE,
option_names = c("nonTradingDayFillOption", "nonTradingDayFillMethod"),
option_values = c("NON_TRADING_WEEKDAYS", "PREVIOUS_VALUE"))
rawd <- dcast(rawd, date ~ ticker) #put into columns
colnames(rawd) <- sub(" .*", "", colnames(rawd)) #remove the govt, currncy bits from bb tickers
return(xts(rawd[, -1], order.by = as.POSIXct(rawd[, 1])))
}
国家/地区代码来自一个将两个字母名称与bloomberg收益率曲线描述相关联的结构:
BBcurveIDs <- list(PO = "YCGT0084 Index", #Portugal
DE = "YCGT0016 Index",
FR = "YCGT0014 Index",
SP = "YCGT0061 Index",
IT = "YCGT0040 Index",
AU = "YCGT0001 Index", #Australia
AS = "YCGT0063 Index", #Austria
JP = "YCGT0018 Index",
GB = "YCGT0022 Index",
HK = "YCGT0095 Index",
CA = "YCGT0007 Index",
CH = "YCGT0082 Index",
NO = "YCGT0078 Index",
SE = "YCGT0021 Index",
IR = "YCGT0062 Index",
BE = "YCGT0006 Index",
NE = "YCGT0020 index",
ZA = "YCGT0090 Index",
PL = "YCGT0177 Index", #Poland
MX = "YCGT0251 Index")
因此bbGetCountry将创建4个不同的数据结构,称为actives,staticData,dynamicData和historicData,所有这些都来自以下bloomberg字段:
bbStaticDataFields <- c("ID_ISIN",
"ISSUER",
"COUPON",
"CPN_FREQ",
"MATURITY",
"CALC_TYP_DES", # pricing calculation type
"INFLATION_LINKED_INDICATOR", # N or Y, in R returned as TRUE or FALSE
"ISSUE_DT",
"FIRST_SETTLE_DT",
"PX_METHOD", # PRC or YLD
"PX_DIRTY_CLEAN", # market convention dirty or clean
"DAYS_TO_SETTLE",
"CALLABLE",
"MARKET_SECTOR_DES",
"INDUSTRY_SECTOR",
"INDUSTRY_GROUP",
"INDUSTRY_SUBGROUP")
bbDynamicDataFields <- c("IS_STILL_CALLABLE",
"RTG_MOODY",
"RTG_MOODY_WATCH",
"RTG_SP",
"RTG_SP_WATCH",
"RTG_FITCH",
"RTG_FITCH_WATCH")
bbHistoricDataFields <- c("PX_BID",
"PX_ASK",
#"PX_CLEAN_BID",
#"PX_CLEAN_ASK",
"PX_DIRTY_BID",
"PX_DIRTY_ASK",
#"ASSET_SWAP_SPD_BID",
#"ASSET_SWAP_SPD_ASK",
"LAST_PRICE",
#"SETTLE_DT",
"YLD_YTM_MID")
现在,您已准备好使用所有这些数据结构创建couponbond对象:
createCouponBonds <- function(cCode, dateString) {
cdata <- get(paste(cCode, "data", sep = "")) # get the data set
today <- as.Date(dateString)
settleDate <- today
daycount <- 0
while(daycount < 3) {
settleDate <- settleDate + 1
if (!(weekdays(settleDate) %in% c("Saturday", "Sunday"))) daycount <- daycount + 1
}
goodbonds <- subset(cdata$staticData, COUPON != 0 & INFLATION_LINKED_INDICATOR == FALSE) # clean out zeros and tbills
goodbonds <- goodbonds[rownames(goodbonds) %in% cdata$actives[[dateString]][, 1], ]
stripnames <- sapply(strsplit(rownames(goodbonds), " "), function(x) x[1])
pxbid <- cdata$historicData$PX_BID[today, stripnames]
pxask <- cdata$historicData$PX_ASK[today, stripnames]
pxdbid <- cdata$historicData$PX_DIRTY_BID[today, stripnames]
pxdask <- cdata$historicData$PX_DIRTY_ASK[today, stripnames]
price <- as.numeric((pxbid + pxask) / 2)
accrued <- as.numeric(pxdbid - pxbid)
cashflows <- lapply(rownames(goodbonds), function(x) {
goodflows <- cdata$cfData[[x]][as.Date(cdata$cfData[[x]][, "Date"]) >= today, ]
#gfstipnames <- sapply(strsplit(rownames(goodflows), " "), function(x) x[1]) dunno if I need this
isin <- rep(cdata$staticData[x, "ID_ISIN"], nrow(goodflows))
cf <- apply(goodflows[, 2:3], 1, sum) / 10000
dt <- as.Date(goodflows[, 1])
return(list(isin = isin, cf = cf, dt = dt))
})
isinvec <- unlist(lapply(cashflows, function(x) x$isin))
cfvec <- as.numeric(unlist(lapply(cashflows, function(x) x$cf)))
datevec <- unlist(lapply(cashflows, function(x) x$dt))
govbonds <- list(ISIN = goodbonds$ID_ISIN,
MATURITYDATE = as.Date(goodbonds$MATURITY),
ISSUEDATE = as.Date(goodbonds$FIRST_SETTLE_DT),
COUPONRATE = as.numeric(goodbonds$COUPON) / 100,
PRICE = price,
ACCRUED = accrued,
CASHFLOWS = list(ISIN = isinvec, CF = cfvec, DATE = as.Date(datevec)),
TODAY = settleDate)
govbonds <- list(govbonds)
names(govbonds) <- cCode
class(govbonds) <- "couponbonds"
return(govbonds)
}
仔细查看现金流&lt; - lapply ...函数,因为这是您创建子列表的地方,也是您问题答案的核心,当然,这是如何完成的,取决于非常多关于你如何决定构建中间数据结构,我给了你一个可能性。我意识到我的答案很复杂,但问题非常复杂。您需要的所有代码也不在这个答案中,缺少一些辅助函数,但如果您与我联系,我很乐意提供它们。当然,核心功能的骨架就在这里,实际上,大部分问题在于首先获取数据,并对其进行适当的构造。你正确地推测,每个债券的某些数据是静态的,其中一些是动态的,有些是历史性的。因此,对于不同的couponbonds对象,中间数据结构的尺寸是不同的。你如何表示这取决于你,虽然我已经为每个人使用了单独的列表/数据框,必要时通过债券ID链接。
上面的函数将采用日期字符串,因此您可以使用上述lapply为每个历史数据点执行此操作,并且嘿“presto”,dyncouponds:
spl <<- lapply(dodates, function(x) createCouponBonds("SP", x))
names(spl) <<- lapply(spl, function(x) x$SP$TODAY)
class(spl) <- "dyncouponbonds"
你去吧。你要求它......
如果你没有使用bloomberg,你的输入数据结构将会非常不同,但正如我所说的,开始时,要熟悉lapply和sapply。显然,还有很多其他方法可以解决这个问题,但上面的内容适用于彭博社。如果您了解此代码,您肯定会知道您正在为其他数据源做些什么。
最后请注意,findata.org中的Rbbg包用于连接bloomberg。
答案 1 :(得分:0)
我的2美分,我一直试图用新的Rblpapi
来完成这项工作。 createCouponBonds
部分我仍有一些问题,但我认为其他函数正确返回。不会解决整个问题,但至少部分修复。 BBcurveIDs, bbStaticDataFields, bbDynamicDataFields, bbHistoricDataFields
与上述相同。
bbGetCountry <- function(cCode, up = FALSE) {
if (up == TRUE) startDate <- as.Date("2016-01-01") else startDate <- histStartDate
cal <- Calendar(weekdays=c("saturday", "sunday"))
wdays <- as.list(bizseq(startDate, Sys.Date(), cal))
actives <- lapply(wdays, function(x) {
bds(BBcurveIDs[cCode][[1]], "CURVE_MEMBERS", override = c(CURVE_DATE=format(x, "%Y%m%d")))
})
names(actives) <- wdays
uniqueActives <- unique(unlist(actives))
staticData <- bdp(uniqueActives, bbStaticDataFields)
cfData <- lapply(uniqueActives, function(x) {
bds(x, "DES_CASH_FLOW_ADJ", override = c(SETTLE_DT = format(as.Date(staticData[x, "FIRST_SETTLE_DT"]), "%Y%m%d")))
})
names(cfData) <- uniqueActives
historicData <- lapply(bbHistoricDataFields, function(x) bbdh(uniqueActives, flds = x, startDate = startDate))
names(historicData) <- bbHistoricDataFields
allDates <- as.Date(index(historicData$LAST_PRICE))
save(actives, file = paste("data_", cCode, "actives.dat", sep = ""))
save(staticData, file = paste("data_", cCode, "staticData.dat", sep = ""))
save(cfData, file = paste("data_", cCode, "cfData.dat", sep = ""))
save(historicData, file = paste("data_", cCode, "historicData.dat", sep = ""))
#save(settleDates, file = paste("data_", cCode, "settleDates.dat", sep = ""))
assign(paste(cCode, "data", sep = ""), list(actives = actives, staticData = staticData, cfData = cfData, #
historicData = historicData), pos = 1)
}
和bbdh功能:
bbdh <- function(secs, years = 1, flds = "last_price", startDate = NULL) {
if(is.null(startDate)) startDate <- Sys.Date() - years * 365.25
if(class(startDate) == "Date") stardDate <- format(startDate, "%Y%m%d")
if(nchar(startDate) > 8) startDate <- format(as.Date(startDate), "%Y%m%d")
rawd <- bdh(secs, flds,
startDate,
include.non.trading.days = FALSE,
options = structure(c("PREVIOUS_VALUE", "NON_TRADING_WEEKDAYS"),
names = c("nonTradingDayFillMethod","nonTradingDayFillOption")))
rawd <- ldply(rawd, data.frame)
colnames(rawd) <- c("sec", "date", "fld")
rawd <- dcast(rawd, date ~ sec, value.var="fld")
colnames(rawd) <- gsub(" Corp", "", colnames(rawd))
return(xts(rawd[,-1], order.by=rawd[,1]))
}