我有一个时间序列,其值在时间= 23:00:00总是错误的,所以我需要更改这些值。
示例数据:
data <- data.table(
Time =c("20:47:00","20:52:00","21:25:00","22:25:00","23:00:00","01:02:00"),
Open = c(21.306,21.305,21.305,21.300,22.900,21.286),
TotalVolume = c(5,20,15,11,19,4)
)
看起来像:
Time Open TotalVolume
1: 20:47:00 21.306 5
2: 20:52:00 21.305 20
3: 21:25:00 21.305 15
4: 22:25:00 21.300 11
5: 23:00:00 22.900 19
6: 01:02:00 21.286 4
我想在Time = 23:00:00之前用Open值替换Open值。这应该是这样的:
Time Open TotalVolume
1: 20:47:00 21.306 5
2: 20:52:00 21.305 20
3: 21:25:00 21.305 15
4: 22:25:00 21.300 11
5: 23:00:00 21.300 19
6: 01:02:00 21.286 4
我尝试过使用滞后函数而没有预期的结果:
data$Open[data$Time == "23:00:00"] <- lag(data,1)
和
data$Open[data$Time == "23:00:00"] <- lag(data$Open[data$Time == "23:00:00"],1)
答案 0 :(得分:3)
> n <- which(data$Time=="23:00:00")
> data$Open[n] <- data$Open[n-1]
> data
Time Open TotalVolume
1: 20:47:00 21.306 5
2: 20:52:00 21.305 20
3: 21:25:00 21.305 15
4: 22:25:00 21.300 11
5: 23:00:00 21.300 19
6: 01:02:00 21.286 4
>
n
包含时间为“23:00:00”的位置,因此n-1是紧接在“23:00:00”位置之前的位置。因此,作业data$Open[n] <- data$Open[n-1]
可以实现我们想要的目标。
答案 1 :(得分:2)
尝试使用:
library(dplyr)
> data <- data.table(
+ Time =c("20:47:00","20:52:00","21:25:00","22:25:00","23:00:00","01:02:00"),
+ Open = c(21.306,21.305,21.305,21.300,22.900,21.286),
+ TotalVolume = c(5,20,15,11,19,4)
+ )
> data <- data %>% mutate(Open = ifelse(Time == '23:00:00', lag(Open), Open))
> data
Time Open TotalVolume
1: 20:47:00 21.306 5
2: 20:52:00 21.305 20
3: 21:25:00 21.305 15
4: 22:25:00 21.300 11
5: 23:00:00 21.300 19
6: 01:02:00 21.286 4
您也可以使用这种方式,而不使用像这样的dplyr mutate函数:
> data <- data.table(
+ Time =c("20:47:00","20:52:00","21:25:00","22:25:00","23:00:00","01:02:00"),
+ Open = c(21.306,21.305,21.305,21.300,22.900,21.286),
+ TotalVolume = c(5,20,15,11,19,4)
+ )
> data$Open <- ifelse(data$Time == '23:00:00', lag(data$Open), data$Open)
> data
Time Open TotalVolume
1: 20:47:00 21.306 5
2: 20:52:00 21.305 20
3: 21:25:00 21.305 15
4: 22:25:00 21.300 11
5: 23:00:00 21.300 19
6: 01:02:00 21.286 4