我有这个数据
date signal
1 2009-01-13 09:55:00 4645.00 4838.931 5358.883 Buy2
2 2009-01-14 09:55:00 4767.50 4718.254 5336.703 Buy1
3 2009-01-15 09:55:00 4485.00 4653.316 5274.384 Buy2
4 2009-01-16 09:55:00 4580.00 4537.693 5141.435 Buy1
5 2009-01-19 09:55:00 4532.00 4548.088 4891.041 Buy2
6 2009-01-27 09:55:00 4190.00 4183.503 4548.497 Buy1
7 2009-01-30 09:55:00 4436.00 4155.236 4377.907 Sell1
8 2009-02-02 09:55:00 4217.00 4152.626 4390.802 Sell2
9 2009-02-09 09:55:00 4469.00 4203.437 4376.277 Sell1
10 2009-02-12 09:55:00 4469.90 4220.845 4503.798 Sell2
11 2009-02-13 09:55:00 4553.00 4261.980 4529.777 Sell1
12 2009-02-16 09:55:00 4347.20 4319.656 4564.387 Sell2
13 2009-02-17 09:55:00 4161.05 4371.474 4548.912 Buy2
14 2009-02-27 09:55:00 3875.55 3862.085 4101.929 Buy1
15 2009-03-02 09:55:00 3636.00 3846.423 4036.020 Buy2
16 2009-03-12 09:55:00 3420.00 3372.665 3734.949 Buy1
17 2009-03-13 09:55:00 3656.00 3372.100 3605.357 Sell1
18 2009-03-17 09:55:00 3650.00 3360.421 3663.322 Sell2
19 2009-03-18 09:55:00 3721.00 3363.735 3682.293 Sell1
20 2009-03-20 09:55:00 3687.00 3440.651 3784.778 Sell2
并且必须以这种形式安排
2 2009-01-14 09:55:00 4767.50 4718.254 5336.703 Buy1
7 2009-01-30 09:55:00 4436.00 4155.236 4377.907 Sell1
8 2009-02-02 09:55:00 4217.00 4152.626 4390.802 Sell2
13 2009-02-17 09:55:00 4161.05 4371.474 4548.912 Buy2
14 2009-02-27 09:55:00 3875.55 3862.085 4101.929 Buy1
17 2009-03-13 09:55:00 3656.00 3372.100 3605.357 Sell1
18 2009-03-17 09:55:00 3650.00 3360.421 3663.322 Sell2
使数据按Buy1 Sell1 Sell2 Buy2的顺序排列,并消除中间的观察值。 我已经尝试了几个dplyr:filter命令,但是没有一个给出期望的输出。
答案 0 :(得分:0)
如果我对您的问题有很好的了解,则以下代码可以解决该问题。改编自this discussion。
想法是将序列定义为模式:
pattern <- c("Buy1", "Sell1", "Sell2", "Buy2")
然后在您的列中找到该模式的位置:
library(zoo)
pos <- which(rollapply(data$signal, 4, identical, pattern, fill = FALSE, align = "left"))
并提取模式位置之后的行:
rows <- unlist(lapply(pos, function(x, n) seq(x, x+n-1), 4))
data_filtered <- data[rows,]
Voilà。
编辑
由于我误解了您的问题,所以这里有一个新的解决方案。 您想在列中检索序列“ Buy1”,“ Sell1”,“ Sell2”,“ Buy2”,并消除不适合该序列的观察值。我没有看到简单的矢量化解决方案,因此这里有一个循环来解决该问题。根据数据的大小,您可能需要在RCPP中实现类似的算法或以某种方式对其进行矢量化。
sequence <- c("Buy1", "Sell1", "Sell2", "Buy2")
keep <- logical(length(data$signal))
s <- 0
for (i in seq(1, length(data$signal))){
if (sequence[s +1] == data$signal[i]){
keep[i] <- T
s <- (s + 1) %% 4
} else {
keep[i] <- F
}
}
data_filtered <- data[keep,]
如果效果更好,请告诉我。 如果有人有矢量化的解决方案,我会很好奇。
答案 1 :(得分:0)
您可以将列data $ signal强制为一个因子并定义级别。
data$signal <- as.factor(data.$signal, levels = c("Buy1","Sell1","Buy2","Sell2")
然后您可以对其进行排序
sorted.data <- data[order(signal),]
这是一个很好的答案,它说明了您想做什么:
答案 2 :(得分:0)
这是一个my_text = visual.TextStim(win, pos=[0.5,0])
解决方案:
Rcpp
这里是library(Rcpp)
cppFunction('LogicalVector FindHit(const CharacterVector x, const CharacterVector y) {
LogicalVector res(x.size());
int k = 0;
for(int i = 0; i < x.size(); i++){
if(x[i] == y[k]){
res[i] = true;
k = (k + 1) % y.size();
}
}
return res;
}')
dtt[FindHit(dtt$V6, c('Buy1', 'Sell1', 'Sell2', 'Buy2')),]
# V1 V2 V3 V4 V5 V6
# 2 2009-01-14 09:55:00 4767.50 4718.254 5336.703 Buy1
# 7 2009-01-30 09:55:00 4436.00 4155.236 4377.907 Sell1
# 8 2009-02-02 09:55:00 4217.00 4152.626 4390.802 Sell2
# 13 2009-02-17 09:55:00 4161.05 4371.474 4548.912 Buy2
# 14 2009-02-27 09:55:00 3875.55 3862.085 4101.929 Buy1
# 17 2009-03-13 09:55:00 3656.00 3372.100 3605.357 Sell1
# 18 2009-03-17 09:55:00 3650.00 3360.421 3663.322 Sell2
:
dtt