rxDataStep使用滞后值

时间:2015-04-16 13:21:41

标签: revolution-r

在SAS中,可以通过数据集并使用滞后值。

我这样做的方法是使用一个执行“滞后”的函数,但这可能会在块的开头产生错误的值。例如,如果一个块从第200行开始,那么它将为一个滞后值假定一个NA,该值应该来自第199,999行。

有解决方法吗?

2 个答案:

答案 0 :(得分:0)

你对分块问题完全正确。解决方法是使用rxGetrxSet在块之间传递值。这是功能:

lagVar <- function(dataList) { 

     # .rxStartRow returns the overall row number of the first row in this
     # chunk. So - the first row of the first chunk is equal to one.
     # If this is the very first row, there's no previous value to use - so
     # it's just an NA.
     if(.rxStartRow == 1) {

        # Put the NA out front, then shift all the other values down one row.
        # newName is the desired name of the lagged variable, set using
        # transformObjects - see below
        dataList[[newName]] <- c(NA, dataList[[varToLag]][-.rxNumRows]) 

    } else {

        # If this isn't the very first chunk, we have to fetch the previous
        # value from the previous chunk using .rxGet, then shift all other
        # values down one row, just as before.
        dataList[[newName]] <- c(.rxGet("lastValue"),
                                 dataList[[varToLag]][-.rxNumRows]) 

      }

    # Finally, once this chunk is done processing, set its lastValue so that
    # the next chunk can use it.
    .rxSet("lastValue", dataList[[varToLag]][.rxNumRows])

    # Return dataList with the new variable
    dataList

}

以及如何在rxDataStep中使用它:

# Get a sample dataset
xdfPath <- file.path(rxGetOption("sampleDataDir"), "DJIAdaily.xdf")

# Set a path to a temporary file
xdfLagged <- tempfile(fileext = ".xdf")

# Sort the dataset chronologically - otherwise, the lagging will be random.
rxSort(inData = xdfPath,
       outFile = xdfLagged,
       sortByVars = "Date")

# Finally, put the lagging function to use:
rxDataStep(inData = xdfLagged, 
           outFile = xdfLagged,
           transformObjects = list(
               varToLag = "Open", 
               newName = "previousOpen"), 
           transformFunc = lagVar,
           append = "cols",
           overwrite = TRUE)

# Check the results
rxDataStep(xdfLagged, 
           varsToKeep = c("Date", "Open", "previousOpen"),
           numRows = 10)

该功能本身并不漂亮,但使用它非常简单。希望这会有所帮助。

答案 1 :(得分:0)

这是另一种滞后方法:使用转移日期进行自我合并。这对代码来说非常简单,并且可以一次滞后几个变量。缺点是运行时间比使用transformFunc的答案长2-3倍,并且需要数据集的第二个副本。

# Get a sample dataset
sourcePath <- file.path(rxGetOption("sampleDataDir"), "DJIAdaily.xdf")

# Set up paths for two copies of it
xdfPath <- tempfile(fileext = ".xdf")
xdfPathShifted <- tempfile(fileext = ".xdf")


# Convert "Date" to be Date-classed
rxDataStep(inData = sourcePath,
           outFile = xdfPath,
           transforms = list(Date = as.Date(Date)),
           overwrite = TRUE
)


# Then make the second copy, but shift all the dates up 
# one (or however much you want to lag)
# Use varsToKeep to subset to just the date and 
# the variables you want to lag
rxDataStep(inData = xdfPath,
           outFile = xdfPathShifted,
           varsToKeep = c("Date", "Open", "Close"),
           transforms = list(Date = as.Date(Date) + 1),
           overwrite = TRUE
)

# Create an output XDF (or just overwrite xdfPath)
xdfLagged2 <- tempfile(fileext = ".xdf")

# Use that incremented date to merge variables back on.
# duplicateVarExt will automatically tag variables from the 
# second dataset as "Lagged".
# Note that there's no need to sort manually in this one - 
# rxMerge does it automatically.
rxMerge(inData1 = xdfPath,
        inData2 = xdfPathShifted,
        outFile = xdfLagged2,
        matchVars = "Date",
        type = "left",
        duplicateVarExt = c("", "Lagged")
)