Question

我正在尝试创建一个函数，在该函数中我可以批量处理CSV文件的文件夹。所有CSV文件都包含不正确的时间戳记，因此我有另一个文件，其中包含错误的时间戳记和相应的正确时间戳记之间的差异。例如，我的文件如下所示：

library(lubridate)
library(stringr)
timestamp <- "03-APR-06 12.41.00.000000000 PM US/CENTRAL"
as_datetime(timestamp,tz=str_extract(timestamp,"\\S*$"))
[1] "2003-04-06 00:41:00 CST"

#without lubridate
strptime(strsplit(timestamp," \\S*$")[[1]][1],format="%y-%b-%d %I.%M.%S.%OS %p",tz=str_extract(timestamp,"\\S*$"))

我试图创建一个if语句，以在ID和访问次数匹配时添加差异

df1
ID        Visit    Difference (in seconds)
1002      V2       35
2038      V1       86786

df2
ID        Visit    startTime
1002      V2       2017-12-01 19:47:11
1002      V2       2017-12-01 19:49:55
1002      V2       2017-12-01 19:50:42
1002      V2       2017-12-01 20:18:24

...

它会重复相加35秒，然后再加上86786秒，再加上35，依此类推，所以我会得到这样的输出

if (df1$ID == df2$ID &
      df1$Visit == df2$Visit) {
    df2$startTime <- df2$startTime + df1$Difference
  }

我希望它只加上35秒。有办法吗？

Answer 1

我认为这可以帮助您

# load packages
library(dplyr)
library(lubridate)
# reproduce similar data
df1 <-
  data.frame(
    "ID" = c(1002, 2038),
    "Visit" = as.character(c("V2", "V1")),
    "Difference" = c(35, 86786)
  )
df2 <-
  data.frame(
    "ID" = c(rep(1002, 3), 2038),
    Visit = as.character(rep("V2", 4)),
    startTime = ymd_hms(
      "2017-12-01 19:47:11",
      "2017-12-01 19:49:55",
      "2017-12-01 19:50:42",
      "2017-12-01 20:18:24"
    )
  )
# join before adding time
df <- left_join(df2, df1, by = c("ID", "Visit"))
df %>%
  mutate(new_time = if_else(!is.na(Difference),
                            startTime + Difference,
                            startTime))

将时差添加到一系列文件中

1 个答案: