我正在尝试创建一个函数,在该函数中我可以批量处理CSV文件的文件夹。所有CSV文件都包含不正确的时间戳记,因此我有另一个文件,其中包含错误的时间戳记和相应的正确时间戳记之间的差异。例如,我的文件如下所示:
library(lubridate)
library(stringr)
timestamp <- "03-APR-06 12.41.00.000000000 PM US/CENTRAL"
as_datetime(timestamp,tz=str_extract(timestamp,"\\S*$"))
[1] "2003-04-06 00:41:00 CST"
#without lubridate
strptime(strsplit(timestamp," \\S*$")[[1]][1],format="%y-%b-%d %I.%M.%S.%OS %p",tz=str_extract(timestamp,"\\S*$"))
我试图创建一个if语句,以在ID和访问次数匹配时添加差异
df1
ID Visit Difference (in seconds)
1002 V2 35
2038 V1 86786
df2
ID Visit startTime
1002 V2 2017-12-01 19:47:11
1002 V2 2017-12-01 19:49:55
1002 V2 2017-12-01 19:50:42
1002 V2 2017-12-01 20:18:24
...
它会重复相加35秒,然后再加上86786秒,再加上35,依此类推,所以我会得到这样的输出
if (df1$ID == df2$ID &
df1$Visit == df2$Visit) {
df2$startTime <- df2$startTime + df1$Difference
}
我希望它只加上35秒。有办法吗?
答案 0 :(得分:1)
我认为这可以帮助您
# load packages
library(dplyr)
library(lubridate)
# reproduce similar data
df1 <-
data.frame(
"ID" = c(1002, 2038),
"Visit" = as.character(c("V2", "V1")),
"Difference" = c(35, 86786)
)
df2 <-
data.frame(
"ID" = c(rep(1002, 3), 2038),
Visit = as.character(rep("V2", 4)),
startTime = ymd_hms(
"2017-12-01 19:47:11",
"2017-12-01 19:49:55",
"2017-12-01 19:50:42",
"2017-12-01 20:18:24"
)
)
# join before adding time
df <- left_join(df2, df1, by = c("ID", "Visit"))
df %>%
mutate(new_time = if_else(!is.na(Difference),
startTime + Difference,
startTime))