当n是可变的时,获取计算的前一行值

时间:2018-03-05 17:28:42

标签: r

我有一个日期和维护资格的数据集。我需要从上次维护触摸后得到时间。我读到了shift(),但我之前的行将是可变的。这是一个样本数据集。

         Leg_ID SeqNum MX_IN_Local_Date_Time MX_OUT_Local_Date_Time RON ROD mxTouch mxTouchDate
1  19W-0001-001      1   2018-02-15 07:29:00    2018-02-15 08:14:00  NA  NA       0  1518682440
2  19W-0001-002      2   2018-02-15 09:34:00    2018-02-15 10:39:00  NA  NA      NA          NA
3  19W-0001-003      3   2018-02-15 12:07:00    2018-02-15 13:00:00  NA  NA      NA          NA
4  19W-0001-004      4   2018-02-15 14:36:00    2018-02-15 15:21:00  NA  NA      NA          NA
5  19W-0001-005      5   2018-02-15 18:03:00    2018-02-15 18:43:00  NA  NA      NA          NA
6  19W-0001-006      6   2018-02-15 20:59:00    2018-02-16 06:59:00   1  NA       1  1518764340
7  19W-0001-007      7   2018-02-16 09:40:00    2018-02-16 10:29:00  NA  NA      NA          NA
8  19W-0001-008      8   2018-02-16 12:59:00    2018-02-16 13:55:00  NA  NA      NA          NA
9  19W-0001-009      9   2018-02-16 16:28:00    2018-02-16 17:10:00  NA  NA      NA          NA
10 19W-0001-010     10   2018-02-16 19:45:00    2018-02-16 20:46:00  NA  NA      NA          NA
11 19W-0001-011     11   2018-02-16 21:54:00    2018-02-17 08:00:00  NA  NA      NA          NA
12 19W-0001-012     12   2018-02-17 09:23:00    2018-02-17 10:25:00  NA  NA      NA          NA
13 19W-0001-013     13   2018-02-17 14:58:00    2018-02-17 15:50:00  NA  NA      NA          NA
14 19W-0001-014     14   2018-02-17 18:26:00    2018-02-17 20:20:00  NA  NA      NA          NA
15 19W-0001-015     15   2018-02-17 23:00:00    2018-02-18 08:25:00   1  NA       1  1518942300

我知道如何在循环中完成此操作,但我正在尝试避免R中的循环。基本上我正在寻找的是

If mxTouch = NA then MX_IN_Local_Date_time - previous mxTouchDate

我的最终输出应该是自上一个mxTouchDate以来的总分钟数。谢谢你的帮助。

为了澄清,这是我正在寻找的一个例子。

   date mxTouch sinceLastMxTouch
1     1       0                0
2     2      NA                1
3     3      NA                2
4     4       1                3
5     5      NA                1
6     6      NA                2
7     7      NA                3
8     8      NA                4
9     9       1                5
10   10      NA                1

当mxTouch == 0时,sinceLastMxTouch = 0 当mxTouch == NA或1时,sinceLastMXTouch = date - 上次mxTouch发生的日期。

3 个答案:

答案 0 :(得分:1)

dplyr库中的lag()函数很可能就是你要找的东西。我将假设前一个条目始终是它之前的行(即,数据框被适当地排序)。

df <- data.frame(val=1:10,other=c(5,5,5,5,NA,5,5,5,5,NA))
df
#    val other
#1    1     5
#2    2     5
#3    3     5
#4    4     5
#5    5    NA
#6    6     5
#7    7     5
#8    8     5
#9    9     5
#10  10    NA

#takes the difference of the current element by the previous one
differences <- df$val - lag(df$val) ## except for the first element, all the differences here should be 1, the first element is NA since there is nothing before it
df$other[is.na(df$other)] <- differences[is.na(df$other)] ## select the differences at the NA values only
df
#    val other
#1    1     5
#2    2     5
#3    3     5
#4    4     5
#5    5     1
#6    6     5
#7    7     5
#8    8     5
#9    9     5
#10  10     1

或者,您可以使用diff()在基础R中执行此操作。 lag()的优势在于,当您向其传递大小为N的向量时,它会返回大小为N的向量,而diff()将为您提供大小为N-1的向量。后者只需要您在替换值时确保索引正确。

答案 1 :(得分:1)

希望这有帮助!

library(lubridate)
library(dplyr)
library(tidyr)
#library(zoo)

#convert to timestamp datatype
df$MX_IN_Local_Date_Time <- ymd_hms(df$MX_IN_Local_Date_Time)
df$mxTouchDate <- as.POSIXct(df$mxTouchDate, origin="1970-01-01", tz="GMT")

df %>%
  mutate(mxTouchDate_temp = mxTouchDate) %>%
  fill(mxTouchDate_temp) %>%
  #mutate(mxTouchDate_temp = na.locf(mxTouchDate)) %>%
  mutate(sinceLastMxTouch = ifelse((is.na(mxTouch) | mxTouch == 1), as.integer(MX_IN_Local_Date_Time - lag(mxTouchDate_temp)), mxTouch)) %>%
  select(-mxTouchDate_temp)

输出为:

         Leg_ID SeqNum MX_IN_Local_Date_Time MX_OUT_Local_Date_Time RON ROD mxTouch         mxTouchDate sinceLastMxTouch
1  19W-0001-001      1   2018-02-15 07:29:00    2018-02-15 08:14:00  NA  NA       0 2018-02-15 08:14:00                0
2  19W-0001-002      2   2018-02-15 09:34:00    2018-02-15 10:39:00  NA  NA      NA                <NA>                1
3  19W-0001-003      3   2018-02-15 12:07:00    2018-02-15 13:00:00  NA  NA      NA                <NA>                3
4  19W-0001-004      4   2018-02-15 14:36:00    2018-02-15 15:21:00  NA  NA      NA                <NA>                6
5  19W-0001-005      5   2018-02-15 18:03:00    2018-02-15 18:43:00  NA  NA      NA                <NA>                9
6  19W-0001-006      6   2018-02-15 20:59:00    2018-02-16 06:59:00   1  NA       1 2018-02-16 06:59:00               12
7  19W-0001-007      7   2018-02-16 09:40:00    2018-02-16 10:29:00  NA  NA      NA                <NA>                2
8  19W-0001-008      8   2018-02-16 12:59:00    2018-02-16 13:55:00  NA  NA      NA                <NA>                6
9  19W-0001-009      9   2018-02-16 16:28:00    2018-02-16 17:10:00  NA  NA      NA                <NA>                9
10 19W-0001-010     10   2018-02-16 19:45:00    2018-02-16 20:46:00  NA  NA      NA                <NA>               12
11 19W-0001-011     11   2018-02-16 21:54:00    2018-02-17 08:00:00  NA  NA      NA                <NA>               14
12 19W-0001-012     12   2018-02-17 09:23:00    2018-02-17 10:25:00  NA  NA      NA                <NA>               26
13 19W-0001-013     13   2018-02-17 14:58:00    2018-02-17 15:50:00  NA  NA      NA                <NA>               31
14 19W-0001-014     14   2018-02-17 18:26:00    2018-02-17 20:20:00  NA  NA      NA                <NA>               35
15 19W-0001-015     15   2018-02-17 23:00:00    2018-02-18 08:25:00   1  NA       1 2018-02-18 08:25:00               40

示例数据:

df <- structure(list(Leg_ID = c("19W-0001-001", "19W-0001-002", "19W-0001-003", 
"19W-0001-004", "19W-0001-005", "19W-0001-006", "19W-0001-007", 
"19W-0001-008", "19W-0001-009", "19W-0001-010", "19W-0001-011", 
"19W-0001-012", "19W-0001-013", "19W-0001-014", "19W-0001-015"
), SeqNum = 1:15, MX_IN_Local_Date_Time = c("2018-02-15 07:29:00", 
"2018-02-15 09:34:00", "2018-02-15 12:07:00", "2018-02-15 14:36:00", 
"2018-02-15 18:03:00", "2018-02-15 20:59:00", "2018-02-16 09:40:00", 
"2018-02-16 12:59:00", "2018-02-16 16:28:00", "2018-02-16 19:45:00", 
"2018-02-16 21:54:00", "2018-02-17 09:23:00", "2018-02-17 14:58:00", 
"2018-02-17 18:26:00", "2018-02-17 23:00:00"), MX_OUT_Local_Date_Time = c("2018-02-15 08:14:00", 
"2018-02-15 10:39:00", "2018-02-15 13:00:00", "2018-02-15 15:21:00", 
"2018-02-15 18:43:00", "2018-02-16 06:59:00", "2018-02-16 10:29:00", 
"2018-02-16 13:55:00", "2018-02-16 17:10:00", "2018-02-16 20:46:00", 
"2018-02-17 08:00:00", "2018-02-17 10:25:00", "2018-02-17 15:50:00", 
"2018-02-17 20:20:00", "2018-02-18 08:25:00"), RON = c(NA, NA, 
NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, 1L), ROD = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), mxTouch = c(0L, 
NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, 1L), mxTouchDate = c(1518682440L, 
NA, NA, NA, NA, 1518764340L, NA, NA, NA, NA, NA, NA, NA, NA, 
1518942300L)), .Names = c("Leg_ID", "SeqNum", "MX_IN_Local_Date_Time", 
"MX_OUT_Local_Date_Time", "RON", "ROD", "mxTouch", "mxTouchDate"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15"))

答案 2 :(得分:0)

所以我找到了使用fill()包中的{tidyr}的解决方案。我也以不同的方式使用lag()来完成代码。这是我最后的工作代码。

mxStaSet <- add_column(mxStaSet, mxTouch = ifelse(mxStaSet$RON == 1 | 
                          mxStaSet$ROD == 1, 1,NA))
mxStaSet$mxTouch <- ifelse(mxStaSet$SeqNum == 1, 0, mxStaSet$mxTouch)

mxStaSet <- add_column(mxStaSet, mxTouchDate = 
                         ifelse(mxStaSet$mxTouch >= 0,
                                mxStaSet$MX_OUT_Local_Date_Time, NA))

mxStaSet <- mxStaSet %>% fill(mxTouchDate)

mxStaSet$mxTouchDate <- ifelse(mxStaSet$mxTouch == 0 | is.na(mxStaSet$mxTouch), 
                              mxStaSet$mxTouchDate, lag(mxStaSet$mxTouchDate))

mxStaSet$mxTouchDate <- as.integer(mxStaSet$MX_OUT_Local_Date_Time - 
                                 mxStaSet$mxTouchDate)/60/60

这将返回我想要的输出:

         Leg_ID SeqNum MX_IN_Local_Date_Time MX_OUT_Local_Date_Time RON ROD mxTouch mxTouchDate
1  19W-0001-001      1   2018-02-15 07:29:00    2018-02-15 08:14:00  NA  NA       0    0.000000
2  19W-0001-002      2   2018-02-15 09:34:00    2018-02-15 10:39:00  NA  NA      NA    2.416667
3  19W-0001-003      3   2018-02-15 12:07:00    2018-02-15 13:00:00  NA  NA      NA    4.766667
4  19W-0001-004      4   2018-02-15 14:36:00    2018-02-15 15:21:00  NA  NA      NA    7.116667
5  19W-0001-005      5   2018-02-15 18:03:00    2018-02-15 18:43:00  NA  NA      NA   10.483333
6  19W-0001-006      6   2018-02-15 20:59:00    2018-02-16 06:59:00   1  NA       1   22.750000
7  19W-0001-007      7   2018-02-16 09:40:00    2018-02-16 10:29:00  NA  NA      NA    3.500000
8  19W-0001-008      8   2018-02-16 12:59:00    2018-02-16 13:55:00  NA  NA      NA    6.933333
9  19W-0001-009      9   2018-02-16 16:28:00    2018-02-16 17:10:00  NA  NA      NA   10.183333
10 19W-0001-010     10   2018-02-16 19:45:00    2018-02-16 20:46:00  NA  NA      NA   13.783333
11 19W-0001-011     11   2018-02-16 21:54:00    2018-02-17 08:00:00  NA  NA      NA   25.016667
12 19W-0001-012     12   2018-02-17 09:23:00    2018-02-17 10:25:00  NA  NA      NA   27.433333
13 19W-0001-013     13   2018-02-17 14:58:00    2018-02-17 15:50:00  NA  NA      NA   32.850000
14 19W-0001-014     14   2018-02-17 18:26:00    2018-02-17 20:20:00  NA  NA      NA   37.350000
15 19W-0001-015     15   2018-02-17 23:00:00    2018-02-18 08:25:00   1  NA       1   49.433333

再次感谢所有输入。