在我第一次使用R的尝试中,我编写了两个功能,这些功能并不是非常高性能,如果我可以获得一些关于如何使它们更高性能(矢量化)的提示,我将不胜感激。这两个功能最后都带有“测试用例”。
第一个函数采用两个时间序列xts对象x和y,并返回一个系列,其中包含有关x高于/低于y的天数的数据。
require('xts')
require('quantmod')
countDaysBelowOrAbove <- function(x, y) {
x <- try.xts(x, error=as.matrix)
y <- try.xts(y, error=as.matrix)
if(is.xts(x) && is.xts(y)) {
xy <- cbind(x,y)
} else {
xy <- cbind( as.vector(x), as.vector(y) )
}
# Count NAs, ensure they're only at beginning of data, then remove.
xNAs <- sum( is.na(x) )
yNAs <- sum( is.na(y) )
NAs <- max( xNAs, yNAs )
if( NAs > 0 ) {
if( any( is.na(xy[-(1:NAs),]) ) ) stop("Series contain non-leading NAs")
}
resultDaysLower <- x
resultDaysHigher <- x
resultDaysLower[!is.na(resultDaysLower)]<-0
resultDaysHigher[!is.na(resultDaysHigher)]<-0
series<-cbind(xy, resultDaysLower, resultDaysHigher)
colnames(series) <- c(names(xy), "cumDaysLower", "cumDaysHigher")
daysLower = 0
daysHigher = 0
for (i in 1:NROW(xy)) {
if (!(is.na(series[,1][i]) | is.na(series[,2][i]))) {
if (series[,1][i] >= series[,2][i]) {
daysLower = 0
daysHigher = daysHigher + 1
}
else {
daysHigher = 0
daysLower = daysLower + 1
}
}
else {
daysLower = 0
daysHigher = 0
}
series$cumDaysLower[i] = daysLower
series$cumDaysHigher[i] = daysHigher
}
return(series)
}
getSymbols("SPY", from='2005-01-01')
SPYclose = Cl(SPY)
getSymbols("QQQQ", from='2005-01-01')
QQQQclose = Cl(QQQQ)
testData = countDaysBelowOrAbove(SPYclose, QQQQclose)
第二个功能我将非常感谢有关性能优化的帮助如下。该函数将xts对象系列作为参数,使用表示间隔长度的xts对象来计算指定时间的系列最小值。该函数返回计算出的最小系列,其中包含指定窗口的最小计算长度。
minimumWithVaryingLength<-function(series, lengths) {
series <- try.xts(series, error=as.matrix)
lengths <- try.xts(lengths, error=as.matrix)
if(is.xts(series) && is.xts(lengths)) {
serieslengths <- cbind(series,lengths)
} else {
serieslengths <- cbind( as.vector(series), as.vector(lengths) )
}
# Count NAs, ensure they're only at beginning of data, then remove.
seriesNAs <- sum( is.na(series) )
lengthsNAs <- sum( is.na(lengths) )
NAs <- max( seriesNAs, lengthsNAs )
if( NAs > 0 ) {
if( any( is.na(serieslengths[-(1:NAs),]) ) ) stop("Series contain non-leading NAs")
}
result <- series
result[!is.na(result)]<-0
for (i in 1:NROW(serieslengths)) {
if (lengths[i] > 0) {
result[i] <- runMin(series, n=lengths[i], cumulative=FALSE)[i]
}
else {
result[i] <- 0
}
}
return(result)
}
getSymbols("SPY", from='2005-01-01')
SPYclose = Cl(SPY)
getSymbols("QQQQ", from='2005-01-01')
QQQQclose = Cl(QQQQ)
numDaysBelow = countDaysBelowOrAbove(SPYclose, QQQQclose)
test = minimumWithVaryingLength(SPYclose, numDaysBelow)
提前感谢您的帮助。
亲切的问候, 萨莫。
答案 0 :(得分:4)
对于第一个功能,您要查找系列x
低于/高于y
的累计期数。为此,您可以使用从CumCount()
构建的此便捷函数cummax
。首先是一些样本数据:
set.seed(1)
x <- sample(1:5,20,T)
y <- sample(1:5,20,T)
CumCount <- function(x) {
z <- cumsum(x)
z - cummax(z*(!x))
}
CumLow = CumCount(x<y)
CumHigh = CumCount(x>y)
对于第二次计算,您尝试在x
期间找到累积最小 x < y
值在每个时间段中。为此,rle
函数非常有用(“run-length-encoding”)。
# runs equals the length of each phase (x < y or x > y)
runs <- rle(CumLow > 0)$lengths
# starts is the number of periods prior to each phase...
starts <- c(0,cumsum(runs)[-length(runs)])
#... which we use to build "blocks", a list of indices of each phase.
blocks <- mapply( function(x,y) x+y, starts, lapply(runs,seq))
# now apply the cummin function within each block:
# (remember to mask it by CumLow > 0 --
# we only want to do this within the x<y phase)
BlockCumMin <- unlist(sapply(blocks, function(blk) cummin(x[blk]))) * (CumLow > 0)
现在我们把它们放在一起:
> cbind(x,y, CumLow, CumHigh, BlockCumMin)
x y CumLow CumHigh BlockCumMin
[1,] 3 4 1 0 3
[2,] 4 2 0 1 0
[3,] 2 2 0 0 0
[4,] 2 5 1 0 2
[5,] 4 4 0 0 0
[6,] 2 2 0 0 0
[7,] 4 1 0 1 0
[8,] 1 3 1 0 1
[9,] 2 5 2 0 1
[10,] 1 3 3 0 1
[11,] 2 5 4 0 1
[12,] 1 4 5 0 1
[13,] 4 2 0 1 0
[14,] 5 3 0 2 0
[15,] 4 1 0 3 0
[16,] 4 1 0 4 0
[17,] 3 4 1 0 3
[18,] 3 1 0 1 0
[19,] 5 3 0 2 0
[20,] 4 4 0 0 0
请注意,此问题与this question
有关 更新。 对于您有series
向量的更一般情况,lengths
向量(长度与{{1相同) }}),并且您希望生成一个名为series
的结果,其中BlockMins
是位于BlockMins[i]
结尾的lengths[i]
series
块的最小值,您可以做到以下几点。由于长度是任意的,因此不再是累积最小值;对于每个i
,您必须获取i
length[i]
元素的最小值,其结尾位置为series
:
i
答案 1 :(得分:1)
如果您没有处理时间序列设备,如果您有两个向量x和y,并希望“返回一个包含多少天x高于/低于y的数据的系列”,只需比较它们:
# Make up some data
x <- seq(100)
y <- x[sample(x)]
# Compare
x.greater <- sum(x>y)
x.lesser <- sum(x<y)
这个的关键是当你对逻辑矢量求和时,例如(x> y),R将TRUE变为1,将FALSE变为0。