在r中将特定值更改为滞后值

时间:2015-12-31 21:21:40

标签: r

我有一个时间序列,其值在时间= 23:00:00总是错误的,所以我需要更改这些值。

示例数据:

data <- data.table(
    Time =c("20:47:00","20:52:00","21:25:00","22:25:00","23:00:00","01:02:00"),
    Open = c(21.306,21.305,21.305,21.300,22.900,21.286),
    TotalVolume = c(5,20,15,11,19,4) 
    )

看起来像:

   Time        Open     TotalVolume
1: 20:47:00    21.306    5
2: 20:52:00    21.305    20
3: 21:25:00    21.305    15
4: 22:25:00    21.300    11
5: 23:00:00    22.900    19
6: 01:02:00    21.286    4

我想在Time = 23:00:00之前用Open值替换Open值。这应该是这样的:

   Time        Open     TotalVolume
1: 20:47:00    21.306    5
2: 20:52:00    21.305    20
3: 21:25:00    21.305    15
4: 22:25:00    21.300    11
5: 23:00:00    21.300    19
6: 01:02:00    21.286    4

我尝试过使用滞后函数而没有预期的结果:

data$Open[data$Time == "23:00:00"] <- lag(data,1)

data$Open[data$Time == "23:00:00"] <- lag(data$Open[data$Time == "23:00:00"],1)

2 个答案:

答案 0 :(得分:3)

> n <- which(data$Time=="23:00:00")
> data$Open[n] <- data$Open[n-1]
> data
       Time   Open TotalVolume
1: 20:47:00 21.306           5
2: 20:52:00 21.305          20
3: 21:25:00 21.305          15
4: 22:25:00 21.300          11
5: 23:00:00 21.300          19
6: 01:02:00 21.286           4
> 

n包含时间为“23:00:00”的位置,因此n-1是紧接在“23:00:00”位置之前的位置。因此,作业data$Open[n] <- data$Open[n-1]可以实现我们想要的目标。

答案 1 :(得分:2)

尝试使用:

library(dplyr)
> data <- data.table(
+     Time =c("20:47:00","20:52:00","21:25:00","22:25:00","23:00:00","01:02:00"),
+     Open = c(21.306,21.305,21.305,21.300,22.900,21.286),
+     TotalVolume = c(5,20,15,11,19,4) 
+ )
> data <- data %>% mutate(Open = ifelse(Time == '23:00:00', lag(Open), Open))
> data
       Time   Open TotalVolume
1: 20:47:00 21.306           5
2: 20:52:00 21.305          20
3: 21:25:00 21.305          15
4: 22:25:00 21.300          11
5: 23:00:00 21.300          19
6: 01:02:00 21.286           4

您也可以使用这种方式,而不使用像这样的dplyr mutate函数:

> data <- data.table(
+     Time =c("20:47:00","20:52:00","21:25:00","22:25:00","23:00:00","01:02:00"),
+     Open = c(21.306,21.305,21.305,21.300,22.900,21.286),
+     TotalVolume = c(5,20,15,11,19,4) 
+ )
> data$Open <- ifelse(data$Time == '23:00:00', lag(data$Open), data$Open)
> data
       Time   Open TotalVolume
1: 20:47:00 21.306           5
2: 20:52:00 21.305          20
3: 21:25:00 21.305          15
4: 22:25:00 21.300          11
5: 23:00:00 21.300          19
6: 01:02:00 21.286           4