我有一个日期和维护资格的数据集。我需要从上次维护触摸后得到时间。我读到了shift()
,但我之前的行将是可变的。这是一个样本数据集。
Leg_ID SeqNum MX_IN_Local_Date_Time MX_OUT_Local_Date_Time RON ROD mxTouch mxTouchDate
1 19W-0001-001 1 2018-02-15 07:29:00 2018-02-15 08:14:00 NA NA 0 1518682440
2 19W-0001-002 2 2018-02-15 09:34:00 2018-02-15 10:39:00 NA NA NA NA
3 19W-0001-003 3 2018-02-15 12:07:00 2018-02-15 13:00:00 NA NA NA NA
4 19W-0001-004 4 2018-02-15 14:36:00 2018-02-15 15:21:00 NA NA NA NA
5 19W-0001-005 5 2018-02-15 18:03:00 2018-02-15 18:43:00 NA NA NA NA
6 19W-0001-006 6 2018-02-15 20:59:00 2018-02-16 06:59:00 1 NA 1 1518764340
7 19W-0001-007 7 2018-02-16 09:40:00 2018-02-16 10:29:00 NA NA NA NA
8 19W-0001-008 8 2018-02-16 12:59:00 2018-02-16 13:55:00 NA NA NA NA
9 19W-0001-009 9 2018-02-16 16:28:00 2018-02-16 17:10:00 NA NA NA NA
10 19W-0001-010 10 2018-02-16 19:45:00 2018-02-16 20:46:00 NA NA NA NA
11 19W-0001-011 11 2018-02-16 21:54:00 2018-02-17 08:00:00 NA NA NA NA
12 19W-0001-012 12 2018-02-17 09:23:00 2018-02-17 10:25:00 NA NA NA NA
13 19W-0001-013 13 2018-02-17 14:58:00 2018-02-17 15:50:00 NA NA NA NA
14 19W-0001-014 14 2018-02-17 18:26:00 2018-02-17 20:20:00 NA NA NA NA
15 19W-0001-015 15 2018-02-17 23:00:00 2018-02-18 08:25:00 1 NA 1 1518942300
我知道如何在循环中完成此操作,但我正在尝试避免R中的循环。基本上我正在寻找的是
If mxTouch = NA then MX_IN_Local_Date_time - previous mxTouchDate
我的最终输出应该是自上一个mxTouchDate以来的总分钟数。谢谢你的帮助。
为了澄清,这是我正在寻找的一个例子。
date mxTouch sinceLastMxTouch
1 1 0 0
2 2 NA 1
3 3 NA 2
4 4 1 3
5 5 NA 1
6 6 NA 2
7 7 NA 3
8 8 NA 4
9 9 1 5
10 10 NA 1
当mxTouch == 0时,sinceLastMxTouch = 0 当mxTouch == NA或1时,sinceLastMXTouch = date - 上次mxTouch发生的日期。
答案 0 :(得分:1)
dplyr库中的lag()
函数很可能就是你要找的东西。我将假设前一个条目始终是它之前的行(即,数据框被适当地排序)。
df <- data.frame(val=1:10,other=c(5,5,5,5,NA,5,5,5,5,NA))
df
# val other
#1 1 5
#2 2 5
#3 3 5
#4 4 5
#5 5 NA
#6 6 5
#7 7 5
#8 8 5
#9 9 5
#10 10 NA
#takes the difference of the current element by the previous one
differences <- df$val - lag(df$val) ## except for the first element, all the differences here should be 1, the first element is NA since there is nothing before it
df$other[is.na(df$other)] <- differences[is.na(df$other)] ## select the differences at the NA values only
df
# val other
#1 1 5
#2 2 5
#3 3 5
#4 4 5
#5 5 1
#6 6 5
#7 7 5
#8 8 5
#9 9 5
#10 10 1
或者,您可以使用diff()
在基础R中执行此操作。 lag()
的优势在于,当您向其传递大小为N的向量时,它会返回大小为N的向量,而diff()
将为您提供大小为N-1的向量。后者只需要您在替换值时确保索引正确。
答案 1 :(得分:1)
希望这有帮助!
library(lubridate)
library(dplyr)
library(tidyr)
#library(zoo)
#convert to timestamp datatype
df$MX_IN_Local_Date_Time <- ymd_hms(df$MX_IN_Local_Date_Time)
df$mxTouchDate <- as.POSIXct(df$mxTouchDate, origin="1970-01-01", tz="GMT")
df %>%
mutate(mxTouchDate_temp = mxTouchDate) %>%
fill(mxTouchDate_temp) %>%
#mutate(mxTouchDate_temp = na.locf(mxTouchDate)) %>%
mutate(sinceLastMxTouch = ifelse((is.na(mxTouch) | mxTouch == 1), as.integer(MX_IN_Local_Date_Time - lag(mxTouchDate_temp)), mxTouch)) %>%
select(-mxTouchDate_temp)
输出为:
Leg_ID SeqNum MX_IN_Local_Date_Time MX_OUT_Local_Date_Time RON ROD mxTouch mxTouchDate sinceLastMxTouch
1 19W-0001-001 1 2018-02-15 07:29:00 2018-02-15 08:14:00 NA NA 0 2018-02-15 08:14:00 0
2 19W-0001-002 2 2018-02-15 09:34:00 2018-02-15 10:39:00 NA NA NA <NA> 1
3 19W-0001-003 3 2018-02-15 12:07:00 2018-02-15 13:00:00 NA NA NA <NA> 3
4 19W-0001-004 4 2018-02-15 14:36:00 2018-02-15 15:21:00 NA NA NA <NA> 6
5 19W-0001-005 5 2018-02-15 18:03:00 2018-02-15 18:43:00 NA NA NA <NA> 9
6 19W-0001-006 6 2018-02-15 20:59:00 2018-02-16 06:59:00 1 NA 1 2018-02-16 06:59:00 12
7 19W-0001-007 7 2018-02-16 09:40:00 2018-02-16 10:29:00 NA NA NA <NA> 2
8 19W-0001-008 8 2018-02-16 12:59:00 2018-02-16 13:55:00 NA NA NA <NA> 6
9 19W-0001-009 9 2018-02-16 16:28:00 2018-02-16 17:10:00 NA NA NA <NA> 9
10 19W-0001-010 10 2018-02-16 19:45:00 2018-02-16 20:46:00 NA NA NA <NA> 12
11 19W-0001-011 11 2018-02-16 21:54:00 2018-02-17 08:00:00 NA NA NA <NA> 14
12 19W-0001-012 12 2018-02-17 09:23:00 2018-02-17 10:25:00 NA NA NA <NA> 26
13 19W-0001-013 13 2018-02-17 14:58:00 2018-02-17 15:50:00 NA NA NA <NA> 31
14 19W-0001-014 14 2018-02-17 18:26:00 2018-02-17 20:20:00 NA NA NA <NA> 35
15 19W-0001-015 15 2018-02-17 23:00:00 2018-02-18 08:25:00 1 NA 1 2018-02-18 08:25:00 40
示例数据:
df <- structure(list(Leg_ID = c("19W-0001-001", "19W-0001-002", "19W-0001-003",
"19W-0001-004", "19W-0001-005", "19W-0001-006", "19W-0001-007",
"19W-0001-008", "19W-0001-009", "19W-0001-010", "19W-0001-011",
"19W-0001-012", "19W-0001-013", "19W-0001-014", "19W-0001-015"
), SeqNum = 1:15, MX_IN_Local_Date_Time = c("2018-02-15 07:29:00",
"2018-02-15 09:34:00", "2018-02-15 12:07:00", "2018-02-15 14:36:00",
"2018-02-15 18:03:00", "2018-02-15 20:59:00", "2018-02-16 09:40:00",
"2018-02-16 12:59:00", "2018-02-16 16:28:00", "2018-02-16 19:45:00",
"2018-02-16 21:54:00", "2018-02-17 09:23:00", "2018-02-17 14:58:00",
"2018-02-17 18:26:00", "2018-02-17 23:00:00"), MX_OUT_Local_Date_Time = c("2018-02-15 08:14:00",
"2018-02-15 10:39:00", "2018-02-15 13:00:00", "2018-02-15 15:21:00",
"2018-02-15 18:43:00", "2018-02-16 06:59:00", "2018-02-16 10:29:00",
"2018-02-16 13:55:00", "2018-02-16 17:10:00", "2018-02-16 20:46:00",
"2018-02-17 08:00:00", "2018-02-17 10:25:00", "2018-02-17 15:50:00",
"2018-02-17 20:20:00", "2018-02-18 08:25:00"), RON = c(NA, NA,
NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, 1L), ROD = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), mxTouch = c(0L,
NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, 1L), mxTouchDate = c(1518682440L,
NA, NA, NA, NA, 1518764340L, NA, NA, NA, NA, NA, NA, NA, NA,
1518942300L)), .Names = c("Leg_ID", "SeqNum", "MX_IN_Local_Date_Time",
"MX_OUT_Local_Date_Time", "RON", "ROD", "mxTouch", "mxTouchDate"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15"))
答案 2 :(得分:0)
所以我找到了使用fill()
包中的{tidyr}
的解决方案。我也以不同的方式使用lag()
来完成代码。这是我最后的工作代码。
mxStaSet <- add_column(mxStaSet, mxTouch = ifelse(mxStaSet$RON == 1 |
mxStaSet$ROD == 1, 1,NA))
mxStaSet$mxTouch <- ifelse(mxStaSet$SeqNum == 1, 0, mxStaSet$mxTouch)
mxStaSet <- add_column(mxStaSet, mxTouchDate =
ifelse(mxStaSet$mxTouch >= 0,
mxStaSet$MX_OUT_Local_Date_Time, NA))
mxStaSet <- mxStaSet %>% fill(mxTouchDate)
mxStaSet$mxTouchDate <- ifelse(mxStaSet$mxTouch == 0 | is.na(mxStaSet$mxTouch),
mxStaSet$mxTouchDate, lag(mxStaSet$mxTouchDate))
mxStaSet$mxTouchDate <- as.integer(mxStaSet$MX_OUT_Local_Date_Time -
mxStaSet$mxTouchDate)/60/60
这将返回我想要的输出:
Leg_ID SeqNum MX_IN_Local_Date_Time MX_OUT_Local_Date_Time RON ROD mxTouch mxTouchDate
1 19W-0001-001 1 2018-02-15 07:29:00 2018-02-15 08:14:00 NA NA 0 0.000000
2 19W-0001-002 2 2018-02-15 09:34:00 2018-02-15 10:39:00 NA NA NA 2.416667
3 19W-0001-003 3 2018-02-15 12:07:00 2018-02-15 13:00:00 NA NA NA 4.766667
4 19W-0001-004 4 2018-02-15 14:36:00 2018-02-15 15:21:00 NA NA NA 7.116667
5 19W-0001-005 5 2018-02-15 18:03:00 2018-02-15 18:43:00 NA NA NA 10.483333
6 19W-0001-006 6 2018-02-15 20:59:00 2018-02-16 06:59:00 1 NA 1 22.750000
7 19W-0001-007 7 2018-02-16 09:40:00 2018-02-16 10:29:00 NA NA NA 3.500000
8 19W-0001-008 8 2018-02-16 12:59:00 2018-02-16 13:55:00 NA NA NA 6.933333
9 19W-0001-009 9 2018-02-16 16:28:00 2018-02-16 17:10:00 NA NA NA 10.183333
10 19W-0001-010 10 2018-02-16 19:45:00 2018-02-16 20:46:00 NA NA NA 13.783333
11 19W-0001-011 11 2018-02-16 21:54:00 2018-02-17 08:00:00 NA NA NA 25.016667
12 19W-0001-012 12 2018-02-17 09:23:00 2018-02-17 10:25:00 NA NA NA 27.433333
13 19W-0001-013 13 2018-02-17 14:58:00 2018-02-17 15:50:00 NA NA NA 32.850000
14 19W-0001-014 14 2018-02-17 18:26:00 2018-02-17 20:20:00 NA NA NA 37.350000
15 19W-0001-015 15 2018-02-17 23:00:00 2018-02-18 08:25:00 1 NA 1 49.433333
再次感谢所有输入。