我一直在网上搜索可以创建滚动窗口的方法,以便我可以以通用的方式执行称为时间序列分析的交叉验证技术。
但是,我还没有解决任何在以下方面引入灵活性的解决方案:1)窗口大小(几乎所有方法都具有这种大小;例如,pandas
rolling或有些不同的{{ 3}})和2)窗口滚动量,即我们希望滚动窗口多少个索引(即尚未找到包含该索引的索引)。
我一直在np.roll中的 @coldspeed 的帮助下优化和编写简洁的代码(由于无法达到所需的声誉,我无法在此处发表评论;希望能尽快到达那里!),但我还无法计算出窗户的滚动量。
我的想法:
我尝试使用np.roll
以及下面的示例,但没有成功。
我还尝试过修改下面乘以ith
值的代码,但是我没有使其适合列表理解,我想维护它。
3。下面的示例适用于任何大小的窗口,但是,它仅“向前”滚动窗口一步,我希望可以将其推广到任何一步。
那么, ??有什么方法可以在列表理解方法中使用这两个参数?或者,¿还有其他我找不到的资源可以使此操作变得容易吗?非常感谢所有帮助。我的示例代码如下:
In [1]: import numpy as np
In [2]: arr = np.random.random((10,3))
In [3]: arr
Out[3]: array([[0.38020065, 0.22656515, 0.25926935],
[0.13446667, 0.04386083, 0.47210474],
[0.4374763 , 0.20024762, 0.50494097],
[0.49770835, 0.16381492, 0.6410294 ],
[0.9711233 , 0.2004874 , 0.71186102],
[0.61729025, 0.72601898, 0.18970222],
[0.99308981, 0.80017134, 0.64955358],
[0.46632326, 0.37341677, 0.49950571],
[0.45753235, 0.55642914, 0.31972887],
[0.4371343 , 0.08905587, 0.74511753]])
In [4]: inSamplePercentage = 0.4
In [5]: outSamplePercentage = 0.3 * inSamplePercentage
In [6]: windowSizeTrain = round(inSamplePercentage * arr.shape[0])
In [7]: windowSizeTest = round(outSamplePercentage * arr.shape[0])
In [8]: windowTrPlusTs = windowSizeTrain + windowSizeTest
In [9]: sliceListX = [arr[i: i + windowTrPlusTs] for i in range(len(arr) - (windowTrPlusTs-1))]
鉴于窗口长度为5,窗口滚动数量为2,我可以这样指定:
Out [15]:
[array([[0.38020065, 0.22656515, 0.25926935],
[0.13446667, 0.04386083, 0.47210474],
[0.4374763 , 0.20024762, 0.50494097],
[0.49770835, 0.16381492, 0.6410294 ],
[0.9711233 , 0.2004874 , 0.71186102]]),
array([[0.4374763 , 0.20024762, 0.50494097],
[0.49770835, 0.16381492, 0.6410294 ],
[0.9711233 , 0.2004874 , 0.71186102],
[0.61729025, 0.72601898, 0.18970222],
[0.99308981, 0.80017134, 0.64955358]]),
array([[0.9711233 , 0.2004874 , 0.71186102],
[0.61729025, 0.72601898, 0.18970222],
[0.99308981, 0.80017134, 0.64955358],
[0.46632326, 0.37341677, 0.49950571],
[0.45753235, 0.55642914, 0.31972887]]),
array([[0.99308981, 0.80017134, 0.64955358],
[0.46632326, 0.37341677, 0.49950571],
[0.45753235, 0.55642914, 0.31972887],
[0.4371343 , 0.08905587, 0.74511753]])]
(它合并了最后一个数组,尽管其长度小于5)。
OR:
Out [16]:
[array([[0.38020065, 0.22656515, 0.25926935],
[0.13446667, 0.04386083, 0.47210474],
[0.4374763 , 0.20024762, 0.50494097],
[0.49770835, 0.16381492, 0.6410294 ],
[0.9711233 , 0.2004874 , 0.71186102]]),
array([[0.4374763 , 0.20024762, 0.50494097],
[0.49770835, 0.16381492, 0.6410294 ],
[0.9711233 , 0.2004874 , 0.71186102],
[0.61729025, 0.72601898, 0.18970222],
[0.99308981, 0.80017134, 0.64955358]]),
array([[0.9711233 , 0.2004874 , 0.71186102],
[0.61729025, 0.72601898, 0.18970222],
[0.99308981, 0.80017134, 0.64955358],
[0.46632326, 0.37341677, 0.49950571],
[0.45753235, 0.55642914, 0.31972887]])]
(只有长度== 5的数组->但是,这可以从上面的数组中得到一个简单的掩码)。
编辑:忘记提及this answer -如果熊猫滚动对象支持 iter 方法,则可以完成某些操作。
答案 0 :(得分:3)
所需的IIUC,您可以使用np.lib.stride_tricks.as_strided
创建窗口大小和滚动量的视图,例如:
#redefine arr to see better what is happening than with random numbers
arr = np.arange(30).reshape((10,3))
#get arr properties
arr_0, arr_1 = arr.shape
arr_is = arr.itemsize #the size of element in arr
#parameter window and rolling
win_size = 5
roll_qty = 2
# use as_stribed by defining the right parameters:
from numpy.lib.stride_tricks import as_strided
print (as_strided( arr,
shape=(int((arr_0 - win_size)/roll_qty+1), win_size,arr_1),
strides=(roll_qty*arr_1*arr_is, arr_1*arr_is, arr_is)))
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]],
[[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20]],
[[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
以及其他窗口大小和滚动量:
win_size = 4
roll_qty = 3
print( as_strided( arr,
shape=(int((arr_0 - win_size)/roll_qty+1), win_size,arr_1),
strides=(roll_qty*arr_1*arr_is, arr_1*arr_is, arr_is)))
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26],
[27, 28, 29]]])
答案 1 :(得分:2)
因此,在给我2美分的情况下(借助@ Ben.T 的所有帮助),下面的代码创建了一个“前进分析”基本工具,以了解您的模型/模型以更通用的方式执行。
private final MutableLiveData<Map<String,Integer>> mChecked = new MutableLiveData<>();
private final MutableLiveData<Map<String,Integer>> mUnchecked = new MutableLiveData<Map<String,Integer>>() {
@Override
public void setValue(Map<String,Integer> uncheckedMap) {
super.setValue(uncheckedMap);
Executors.newSingleThreadScheduledExecutor().execute(() -> {
List<String> uncheckedList = new ArrayList<>(uncheckedMap.keySet());
List<String> checkedList = WordsDatabase.getInstance(mApp).wordsDao().checkWords(uncheckedList);
Map<String,Integer> checkedMap = new HashMap<>();
for (String word: uncheckedList) {
Integer score = (checkedList.contains(word) ? uncheckedMap.get(word) : null);
checkedMap.put(word, score);
}
mChecked.postValue(checkedMap);
});
}
};
def walkForwardAnal(myArr, windowSize, rollQty):
from numpy.lib.stride_tricks import as_strided
ArrRows, ArrCols = myArr.shape
ArrItems = myArr.itemsize
sliceQtyAndShape = (int((ArrRows - windowSize) / rollQty + 1), windowSize, ArrCols)
print('The final view shape is {}'.format(sliceQtyAndShape))
ArrStrides = (rollQty * ArrCols * ArrItems, ArrCols * ArrItems, ArrItems)
print('The final strides are {}'.format(ArrStrides))
sliceList = list(as_strided(myArr, shape=sliceQtyAndShape, strides=ArrStrides, writeable=False))
return sliceList
wSizeTr = 400
wSizeTe = 100
wSizeTot = wSizeTr + wSizeTe
rQty = 200
sliceListX = wf.walkForwardAnal(X, wSizeTot, rQty)
sliceListY = wf.walkForwardAnal(y, wSizeTot, rQty)
for sliceArrX, sliceArrY in zip(sliceListX, sliceListY):
## Consider having to make a .copy() of each array, so that we don't modify the original one.
# XArr = sliceArrX.copy() and hence, changing Xtrain, Xtest = XArr[...]
# YArr = sliceArrY.copy() and hence, changing Ytrain, Ytest = XArr[...]
Xtrain = sliceArrX[:-wSizeTe,:]
Xtest = sliceArrX[-wSizeTe:,:]
Ytrain = sliceArrY[:-wSizeTe,:]
Ytest = sliceArrY[-wSizeTe:,:]
timeSeriesCrossVal = TimeSeriesSplit(n_splits=5)
for trainIndex, testIndex in timeSeriesCrossVal.split(X):
## Check if the training and testing quantities make sense. If not, increase or decrease the n_splits parameter.
Xtrain = X[trainIndex]
Xtest = X[testIndex]
Ytrain = y[trainIndex]
Ytest = y[testIndex]
# Fit on training set only - The targets (y) are already encoded in dummy variables, so no need to standarize them.
scaler = StandardScaler()
scaler.fit(Xtrain)
# Apply transform to both the training set and the test set.
trainX = scaler.transform(Xtrain)
testX = scaler.transform(Xtest)
## PCA - Principal Component Analysis #### APPLY PCA TO THE STANDARIZED TRAINING SET! :::: Fit on training set only.
pca = PCA(.95)
pca.fit(trainX)
# Apply transform to both the training set and the test set.
trainX = pca.transform(trainX)
testX = pca.transform(testX)
## Predict and append predictions...