结合时间序列对象和列表:Package" termstrc"

时间:2012-11-22 23:30:11

标签: r list sublist quantitative-finance

用于术语结构估计的R包“termstrc”是一个非常有用的工具,但它需要以特别笨拙的格式设置数据:列表中的列表。

问题:为了创建运行“dyncouponbonds”函数所需的重复子列表格式,在R之外或R之内准备和整形数据的最佳方法是什么?

“dyncouponbonds”命令要求在重复的子列表中设置数据,其中一个债券列表和这些债券的时间不变特征(让我们称之为“债券列表”)附加了这些债券的一些时间特征(价格和应计利息),并复制时间t + 1到T.

以下是一个期间的列表格式示例。 “dyncouponbonds”命令要求在伞形列表中为所有T周期复制此格式。 ISIN,MATURITYDATE,ISSUEDATE,COUPONRATE在每个时期都是相同的。每个时期的价格,ACCRUED,CASHFLOWS和TODAY都会有所不同。

R> str(govbonds$GERMANY)

List of 8
$ ISIN : chr [1:52] "DE0001141414" "DE0001137131" "DE0001141422" ...
$ MATURITYDATE:Class 'Date' num [1:52] 13924 13952 13980 14043 ...
$ ISSUEDATE :Class 'Date' num [1:52] 11913 13215 12153 13298 ...
$ COUPONRATE : num [1:52] 0.0425 0.03 0.03 0.0325 ...
$ PRICE : num [1:52] 100 99.9 99.8 99.8 ...
$ ACCRUED : num [1:52] 4.09 2.66 2.43 2.07 ...
$ CASHFLOWS :List of 3
..$ ISIN: chr [1:384] "DE0001141414" "DE0001137131" "DE0001141422" ...
..$ CF : num [1:384] 104 103 103 103 ...
..$ DATE:Class 'Date' num [1:384] 13924 13952 13980 14043 ...
$ TODAY :Class 'Date' num 13908

2 个答案:

答案 0 :(得分:4)

这是一个相当高级的数据操作问题。 R有许多强大的数据处理工具,你不需要离开R来准备(无可否认的是相当迟钝的)dyncouponbonds对象。事实上,你实际上不应该这样做,因为从另一种语言中获取结构,然后变成dyncouponbonds只会是更多的工作。

我要确定的第一件事是你非常熟悉lapply函数。你将会充分利用它。你将用它来创建一个couponbonds对象列表,这就是dyncouponbonds实际上是什么。然而,创建优惠券对象有点困难,主要是因为CASHFLOWS子列表需要与债券的ISIN相关的每个现金流以及现金流的日期。为此,您将使用lapply和一些相当高级的下标。子集函数也会派上用场。

这个问题在很大程度上取决于你从哪里获取数据,从Bloomberg中获取数据并非易事,主要是因为你需要使用BDS函数和“DES_CASH_FLOW”字段返回历史记录。每个债券都能获得现金流。我说历史,因为如果你使用dyncouponbonds我假设你会想要进行历史收益率曲线分析。您需要覆盖BDS功能的“SETTLE_DT”字段,使用BDP功能和字段“FIRST_SETTLE_DT”获得的债券价值,这样您就可以从债券开始时获得所有现金流(否则它只会从今天起返回,这对历史分析没有好处)。但我离题了。如果您不使用bloomberg,我不知道您将从何处获取此数据。

然后,您需要获取每个债券的静态数据,即到期日,ISIN,票面利率和发行日期。而且您需要历史价格和应计利息数据。再次使用bloomberg,你将使用BDP函数,你将在下面的代码中看到的字段,以及我已经包装为bbdh的历史数据函数BDH。再假设你是一个bloomberg用户,这里是代码:

bbGetCountry <- function(cCode, up = FALSE) {
# this function is going to get all the data out of bloomberg that we need for a
# country, and update it if ncessary
    if (up == TRUE) startDate <- as.Date("2012-01-01") else startDate <- histStartDate 
    # first get all the curve members for history
    wdays <- wdaylist(startDate, Sys.Date()) # create the list of working days from startdate
    actives <- lapply(wdays, function(x) { 
        bds(conn, BBcurveIDs[cCode], "CURVE_MEMBERS", override_fields = "CURVE_DATE",
        override_values = format(x, "%Y%m%d"))
    })
    names(actives) <- wdays
    uniqueActives <- unique(unlist(actives)) # there will be puhlenty duplicates. Get rid of them
    # now get the unchanging bond data
    staticData <- bdp(conn, uniqueActives, bbStaticDataFields)
    # now get the cash flowdata
    cfData <- lapply(uniqueActives, function(x) {
        bds(conn, x, "DES_CASH_FLOW_ADJ", override_fields = "SETTLE_DT", 
            override_values = format(as.Date(staticData[x, "FIRST_SETTLE_DT"]), "%Y%m%d"))
    })
    names(cfData) <- uniqueActives
    # now for historic data
    historicData <- lapply(bbHistoricDataFields, function(x) bbdh(uniqueActives, flds = x, startDate = startDate))
    names(historicData) <- bbHistoricDataFields   # put the names in otherwise we get a numbered list
    allDates <- as.Date(index(historicData$LAST_PRICE)) # all the dates we will find settlement dates for for all bonds. No posix
    save(actives, file = paste("data/", cCode, "actives.dat", sep = ""))      #save all the files now
    save(staticData, file = paste("data/", cCode, "staticData.dat", sep = ""))
    save(cfData, file = paste("data/", cCode, "cfData.dat", sep = ""))
    save(historicData, file = paste("data/", cCode, "historicData.dat", sep = ""))
    #save(settleDates, file = paste("data/", cCode, "settleDates.dat", sep = ""))
    assign(paste(cCode, "data", sep = ""), list(actives = actives, staticData = staticData, cfData = cfData,    #
        historicData = historicData), pos = 1)

}

我上面使用的bbdh函数是Rbbg库的bdh函数的包装,看起来像这样:

bbdh <- function(secs, years = 1, flds = "last_price", startDate = NULL) {
        #this function gets secs over years from bloomberg daily data
            if(is.null(startDate)) startDate <- Sys.Date() - years * 365.25
            if(class(startDate) == "Date") stardDate <- format(startDate, "%Y%m%d") #convert date classes to bb string
            if(nchar(startDate) > 8) startDate <- format(as.Date(startDate), "%Y%m%d") # if we've been passed wrong format character string 
            rawd <- bdh(conn, secs, flds, startDate, always.display.tickers = TRUE, include.non.trading.days = TRUE,
                option_names = c("nonTradingDayFillOption", "nonTradingDayFillMethod"),
                option_values = c("NON_TRADING_WEEKDAYS", "PREVIOUS_VALUE"))
            rawd <- dcast(rawd, date ~ ticker) #put into columns
            colnames(rawd) <- sub(" .*", "", colnames(rawd)) #remove the govt, currncy bits from bb tickers
            return(xts(rawd[, -1], order.by = as.POSIXct(rawd[, 1])))
        }

国家/地区代码来自一个将两个字母名称与bloomberg收益率曲线描述相关联的结构:

BBcurveIDs  <- list(PO = "YCGT0084 Index", #Portugal
                    DE = "YCGT0016 Index", 
                    FR = "YCGT0014 Index", 
                    SP = "YCGT0061 Index",
                    IT = "YCGT0040 Index",
                    AU = "YCGT0001 Index", #Australia
                    AS = "YCGT0063 Index", #Austria
                    JP = "YCGT0018 Index",
                    GB = "YCGT0022 Index",
                    HK = "YCGT0095 Index",
                    CA = "YCGT0007 Index",
                    CH = "YCGT0082 Index",
                    NO = "YCGT0078 Index",
                    SE = "YCGT0021 Index",
                    IR = "YCGT0062 Index",
                    BE = "YCGT0006 Index",
                    NE = "YCGT0020 index", 
                    ZA = "YCGT0090 Index",
                    PL = "YCGT0177 Index", #Poland
                    MX = "YCGT0251 Index")

因此bbGetCountry将创建4个不同的数据结构,称为actives,staticData,dynamicData和historicData,所有这些都来自以下bloomberg字段:

bbStaticDataFields <- c("ID_ISIN",
                      "ISSUER", 
                      "COUPON",
                      "CPN_FREQ",
                      "MATURITY",
                      "CALC_TYP_DES",                    # pricing calculation type 
                      "INFLATION_LINKED_INDICATOR",     # N or Y, in R returned as TRUE or FALSE
                      "ISSUE_DT",
                      "FIRST_SETTLE_DT",
                      "PX_METHOD",                      # PRC or YLD 
                      "PX_DIRTY_CLEAN",                 # market convention dirty or clean
                      "DAYS_TO_SETTLE",
                      "CALLABLE",
                      "MARKET_SECTOR_DES",
                      "INDUSTRY_SECTOR",
                      "INDUSTRY_GROUP",
                      "INDUSTRY_SUBGROUP")

bbDynamicDataFields <- c("IS_STILL_CALLABLE",
                        "RTG_MOODY",
                        "RTG_MOODY_WATCH",
                        "RTG_SP",
                        "RTG_SP_WATCH",
                        "RTG_FITCH",
                        "RTG_FITCH_WATCH")

bbHistoricDataFields <- c("PX_BID",
                          "PX_ASK",
                          #"PX_CLEAN_BID",
                          #"PX_CLEAN_ASK",
                          "PX_DIRTY_BID",
                          "PX_DIRTY_ASK",
                          #"ASSET_SWAP_SPD_BID",
                          #"ASSET_SWAP_SPD_ASK",
                          "LAST_PRICE",
                          #"SETTLE_DT",
                          "YLD_YTM_MID")

现在,您已准备好使用所有这些数据结构创建couponbond对象:

createCouponBonds <- function(cCode, dateString) {
    cdata <- get(paste(cCode, "data", sep = "")) # get the data set
    today <- as.Date(dateString)
    settleDate <- today
    daycount <- 0
    while(daycount < 3) {
        settleDate <- settleDate + 1
        if (!(weekdays(settleDate) %in% c("Saturday", "Sunday"))) daycount <- daycount + 1
    }
    goodbonds <- subset(cdata$staticData, COUPON != 0 & INFLATION_LINKED_INDICATOR == FALSE) # clean out zeros and tbills
    goodbonds <- goodbonds[rownames(goodbonds) %in% cdata$actives[[dateString]][, 1], ]
    stripnames <- sapply(strsplit(rownames(goodbonds), " "), function(x) x[1])
    pxbid <- cdata$historicData$PX_BID[today, stripnames]
    pxask <- cdata$historicData$PX_ASK[today, stripnames]
    pxdbid <- cdata$historicData$PX_DIRTY_BID[today, stripnames]
    pxdask <- cdata$historicData$PX_DIRTY_ASK[today, stripnames]
    price <- as.numeric((pxbid + pxask) / 2)
    accrued <- as.numeric(pxdbid - pxbid)
    cashflows <- lapply(rownames(goodbonds), function(x) {
        goodflows <- cdata$cfData[[x]][as.Date(cdata$cfData[[x]][, "Date"]) >= today, ]
        #gfstipnames <- sapply(strsplit(rownames(goodflows), " "), function(x) x[1]) dunno if I need this
        isin <- rep(cdata$staticData[x, "ID_ISIN"], nrow(goodflows))
        cf <- apply(goodflows[, 2:3], 1, sum) / 10000
        dt <- as.Date(goodflows[, 1])
        return(list(isin = isin, cf = cf, dt = dt))
    })
    isinvec <- unlist(lapply(cashflows, function(x) x$isin))
    cfvec <- as.numeric(unlist(lapply(cashflows, function(x) x$cf)))
    datevec <- unlist(lapply(cashflows, function(x) x$dt))
    govbonds <- list(ISIN = goodbonds$ID_ISIN, 
                     MATURITYDATE = as.Date(goodbonds$MATURITY),
                     ISSUEDATE = as.Date(goodbonds$FIRST_SETTLE_DT),
                     COUPONRATE = as.numeric(goodbonds$COUPON) / 100,
                     PRICE = price,
                     ACCRUED = accrued,
                     CASHFLOWS = list(ISIN = isinvec, CF = cfvec, DATE = as.Date(datevec)),
                     TODAY = settleDate)
    govbonds <- list(govbonds)
    names(govbonds) <- cCode
    class(govbonds) <- "couponbonds"
    return(govbonds)
}

仔细查看现金流&lt; - lapply ...函数,因为这是您创建子列表的地方,也是您问题答案的核心,当然,这是如何完成的,取决于非常多关于你如何决定构建中间数据结构,我给了你一个可能性。我意识到我的答案很复杂,但问题非常复杂。您需要的所有代码也不在这个答案中,缺少一些辅助函数,但如果您与我联系,我很乐意提供它们。当然,核心功能的骨架就在这里,实际上,大部分问题在于首先获取数据,并对其进行适当的构造。你正确地推测,每个债券的某些数据是静态的,其中一些是动态的,有些是历史性的。因此,对于不同的couponbonds对象,中间数据结构的尺寸是不同的。你如何表示这取决于你,虽然我已经为每个人使用了单独的列表/数据框,必要时通过债券ID链接。

上面的函数将采用日期字符串,因此您可以使用上述lapply为每个历史数据点执行此操作,并且嘿“presto”,dyncouponds:

spl <<- lapply(dodates, function(x) createCouponBonds("SP", x))
    names(spl) <<- lapply(spl, function(x) x$SP$TODAY)
    class(spl) <- "dyncouponbonds"

你去吧。你要求它......

如果你没有使用bloomberg,你的输入数据结构将会非常不同,但正如我所说的,开始时,要熟悉lapply和sapply。显然,还有很多其他方法可以解决这个问题,但上面的内容适用于彭博社。如果您了解此代码,您肯定会知道您正在为其他数据源做些什么。

最后请注意,findata.org中的Rbbg包用于连接bloomberg。

答案 1 :(得分:0)

我的2美分,我一直试图用新的Rblpapi来完成这项工作。 createCouponBonds部分我仍有一些问题,但我认为其他函数正确返回。不会解决整个问题,但至少部分修复。 BBcurveIDs, bbStaticDataFields, bbDynamicDataFields, bbHistoricDataFields与上述相同。

bbGetCountry <- function(cCode, up = FALSE) {
  if (up == TRUE) startDate <- as.Date("2016-01-01") else startDate <- histStartDate 
  cal <- Calendar(weekdays=c("saturday", "sunday"))
  wdays <- as.list(bizseq(startDate, Sys.Date(), cal))
  actives <- lapply(wdays, function(x) { 
    bds(BBcurveIDs[cCode][[1]], "CURVE_MEMBERS", override = c(CURVE_DATE=format(x, "%Y%m%d")))
  })
  names(actives) <- wdays
  uniqueActives <- unique(unlist(actives))
  staticData <- bdp(uniqueActives, bbStaticDataFields)
  cfData <- lapply(uniqueActives, function(x) {
    bds(x, "DES_CASH_FLOW_ADJ", override = c(SETTLE_DT = format(as.Date(staticData[x, "FIRST_SETTLE_DT"]), "%Y%m%d")))
  })
  names(cfData) <- uniqueActives

  historicData <- lapply(bbHistoricDataFields, function(x) bbdh(uniqueActives, flds = x, startDate = startDate))
  names(historicData) <- bbHistoricDataFields
  allDates <- as.Date(index(historicData$LAST_PRICE))

  save(actives, file = paste("data_", cCode, "actives.dat", sep = ""))
  save(staticData, file = paste("data_", cCode, "staticData.dat", sep = ""))
  save(cfData, file = paste("data_", cCode, "cfData.dat", sep = ""))
  save(historicData, file = paste("data_", cCode, "historicData.dat", sep = ""))
  #save(settleDates, file = paste("data_", cCode, "settleDates.dat", sep = ""))
  assign(paste(cCode, "data", sep = ""), list(actives = actives, staticData = staticData, cfData = cfData,    #
                                              historicData = historicData), pos = 1)

}

和bbdh功能:

bbdh <- function(secs, years = 1, flds = "last_price", startDate = NULL) {
  if(is.null(startDate)) startDate <- Sys.Date() - years * 365.25
  if(class(startDate) == "Date") stardDate <- format(startDate, "%Y%m%d")
  if(nchar(startDate) > 8) startDate <- format(as.Date(startDate), "%Y%m%d")
  rawd <- bdh(secs, flds, 
              startDate, 
              include.non.trading.days = FALSE,
              options = structure(c("PREVIOUS_VALUE", "NON_TRADING_WEEKDAYS"),
                                  names = c("nonTradingDayFillMethod","nonTradingDayFillOption")))
  rawd <- ldply(rawd, data.frame)
  colnames(rawd) <- c("sec", "date", "fld")
  rawd <- dcast(rawd, date ~ sec, value.var="fld")
  colnames(rawd) <- gsub(" Corp", "", colnames(rawd))
  return(xts(rawd[,-1], order.by=rawd[,1]))
}