在SAS中,可以通过数据集并使用滞后值。
我这样做的方法是使用一个执行“滞后”的函数,但这可能会在块的开头产生错误的值。例如,如果一个块从第200行开始,那么它将为一个滞后值假定一个NA,该值应该来自第199,999行。
有解决方法吗?
答案 0 :(得分:0)
你对分块问题完全正确。解决方法是使用rxGet
和rxSet
在块之间传递值。这是功能:
lagVar <- function(dataList) {
# .rxStartRow returns the overall row number of the first row in this
# chunk. So - the first row of the first chunk is equal to one.
# If this is the very first row, there's no previous value to use - so
# it's just an NA.
if(.rxStartRow == 1) {
# Put the NA out front, then shift all the other values down one row.
# newName is the desired name of the lagged variable, set using
# transformObjects - see below
dataList[[newName]] <- c(NA, dataList[[varToLag]][-.rxNumRows])
} else {
# If this isn't the very first chunk, we have to fetch the previous
# value from the previous chunk using .rxGet, then shift all other
# values down one row, just as before.
dataList[[newName]] <- c(.rxGet("lastValue"),
dataList[[varToLag]][-.rxNumRows])
}
# Finally, once this chunk is done processing, set its lastValue so that
# the next chunk can use it.
.rxSet("lastValue", dataList[[varToLag]][.rxNumRows])
# Return dataList with the new variable
dataList
}
以及如何在rxDataStep
中使用它:
# Get a sample dataset
xdfPath <- file.path(rxGetOption("sampleDataDir"), "DJIAdaily.xdf")
# Set a path to a temporary file
xdfLagged <- tempfile(fileext = ".xdf")
# Sort the dataset chronologically - otherwise, the lagging will be random.
rxSort(inData = xdfPath,
outFile = xdfLagged,
sortByVars = "Date")
# Finally, put the lagging function to use:
rxDataStep(inData = xdfLagged,
outFile = xdfLagged,
transformObjects = list(
varToLag = "Open",
newName = "previousOpen"),
transformFunc = lagVar,
append = "cols",
overwrite = TRUE)
# Check the results
rxDataStep(xdfLagged,
varsToKeep = c("Date", "Open", "previousOpen"),
numRows = 10)
该功能本身并不漂亮,但使用它非常简单。希望这会有所帮助。
答案 1 :(得分:0)
这是另一种滞后方法:使用转移日期进行自我合并。这对代码来说非常简单,并且可以一次滞后几个变量。缺点是运行时间比使用transformFunc
的答案长2-3倍,并且需要数据集的第二个副本。
# Get a sample dataset
sourcePath <- file.path(rxGetOption("sampleDataDir"), "DJIAdaily.xdf")
# Set up paths for two copies of it
xdfPath <- tempfile(fileext = ".xdf")
xdfPathShifted <- tempfile(fileext = ".xdf")
# Convert "Date" to be Date-classed
rxDataStep(inData = sourcePath,
outFile = xdfPath,
transforms = list(Date = as.Date(Date)),
overwrite = TRUE
)
# Then make the second copy, but shift all the dates up
# one (or however much you want to lag)
# Use varsToKeep to subset to just the date and
# the variables you want to lag
rxDataStep(inData = xdfPath,
outFile = xdfPathShifted,
varsToKeep = c("Date", "Open", "Close"),
transforms = list(Date = as.Date(Date) + 1),
overwrite = TRUE
)
# Create an output XDF (or just overwrite xdfPath)
xdfLagged2 <- tempfile(fileext = ".xdf")
# Use that incremented date to merge variables back on.
# duplicateVarExt will automatically tag variables from the
# second dataset as "Lagged".
# Note that there's no need to sort manually in this one -
# rxMerge does it automatically.
rxMerge(inData1 = xdfPath,
inData2 = xdfPathShifted,
outFile = xdfLagged2,
matchVars = "Date",
type = "left",
duplicateVarExt = c("", "Lagged")
)