将两列(datetime)合并为R中的一列

时间:2017-12-27 15:48:15

标签: r

我有一个较大的数据框,错误的条目在" DyStart"和" DyEnd" ....

dataframe

ID  DyStart           DyEnd             TmStart     TmEnd
1   04.12.2017        04.12.2017        10:10:00    10:50:00
2   01.12.2017        01.12.2017        12:27:00    16:29:00
3   27.11.2017        27.11.2017        14:31:00    15:08:00
4   07.12.2017 13:26  07.12.2017 13:26      

我希望在#34; DyStart"列中列出所有日期。和" DyEnd"并且所有时间都在" TmStart"和" TmEnd"。我没有日期问题......

df$DyStart <- format(as.POSIXct(df$Dyformat,format="%d.%m.%Y"),"%d.%m.%Y")

但我随着时间的推移而变得激动人心。我尝试创建一个新列并与旧列合并...

df$TmStartNew <- format(as.POSIXct(df$DyStart,format="%d.%m.%Y %H:%M"),"%H:%M:%S")

df$TmStart <- ifelse(is.na(df$TmStart), df$TmStartNew, df$TmStart)

我尝试了不同的东西,但我总是得到#34;数字&#34;或&#34;整数&#34;回来了,我无法将格式恢复为H:M:S

如果有人知道解决方案,我将非常感激!

5 个答案:

答案 0 :(得分:2)

有很多方法可以实现它。但我更愿意按照OP中考虑的步骤来实现预期的结果。

似乎修改不按预期顺序执行,这导致OP出现问题。

让我用OP中使用的相同例子来解释。

s <- "ID,  DyStart,           DyEnd,             TmStart,     TmEnd
1,   04.12.2017,        04.12.2017,        10:10:00,    10:50:00
2,   01.12.2017,        01.12.2017,        12:27:00,    16:29:00
3,   27.11.2017,        27.11.2017,        14:31:00,    15:08:00
4,   07.12.2017 13:26,  07.12.2017 13:26"

#Create df as used in OP
df <- read.delim(textConnection(s), header = TRUE, sep = ",", 
strip.white = TRUE, stringsAsFactors = FALSE)
#data looks as
> df
  ID          DyStart            DyEnd  TmStart    TmEnd
1  1       04.12.2017       04.12.2017 10:10:00 10:50:00
2  2       01.12.2017       01.12.2017 12:27:00 16:29:00
3  3       27.11.2017       27.11.2017 14:31:00 15:08:00
4  4 07.12.2017 13:26 07.12.2017 13:26

#First create new columns with Time part from corresponding Date column
df$TmStartNew <- format(as.POSIXct(df$DyStart,format="%d.%m.%Y %H:%M"),"%H:%M:%S")
df$TmEndNew <- format(as.POSIXct(df$DyEnd,format="%d.%m.%Y %H:%M"),"%H:%M:%S")

#Replace valid values from new columns in original Tm columns
df$TmStart <- ifelse(!is.na(df$TmStartNew), df$TmStartNew, df$TmStart)
df$TmEnd <- ifelse(!is.na(df$TmEndNew), df$TmEndNew, df$TmEnd)

#Now modify Date columns to remove time part
df$DyStart <- format(as.POSIXct(df$DyStart,format="%d.%m.%Y"),"%d.%m.%Y")
df$DyEnd <- format(as.POSIXct(df$DyEnd,format="%d.%m.%Y"),"%d.%m.%Y")

#data frame will now contain
> df
  ID    DyStart      DyEnd  TmStart    TmEnd TmStartNew TmEndNew
1  1 04.12.2017 04.12.2017 10:10:00 10:50:00       <NA>     <NA>
2  2 01.12.2017 01.12.2017 12:27:00 16:29:00       <NA>     <NA>
3  3 27.11.2017 27.11.2017 14:31:00 15:08:00       <NA>     <NA>
4  4 07.12.2017 07.12.2017 13:26:00 13:26:00   13:26:00 13:26:00

现在可以放弃TmStartNew&amp; TmEndNew列。

如上所述,此解决方案已在OP的同一行中创建,但还有其他方法可以实现相同的结果。

答案 1 :(得分:1)

您可以使用mutate包中的dplyr添加新列和dmy_hms包中的lubridate,以便在粘贴日期和时间后将字符串转换为日期时间在一起。

看起来像这样:

library(dplyr)
library(lubridate)

df %>%
  mutate(tm_start_new = lubridate::dmy_hms(paste(DyStart, TmStart)),
         tm_end_new = lubridate::dmy_hms(paste(DyEnd, TmEnd)))

这会给你这个:

# A tibble: 3 x 7
      ID    DyStart      DyEnd  TmStart    TmEnd        tm_start_new          tm_end_new
  <fctr>     <fctr>     <fctr>   <fctr>   <fctr>              <dttm>              <dttm>
1      1 04.12.2017 04.12.2017 10:10:00 10:50:00 2017-12-04 10:10:00 2017-12-04 10:50:00
2      2 01.12.2017 01.12.2017 12:27:00 16:29:00 2017-12-01 12:27:00 2017-12-01 16:29:00
3      3 27.11.2017 27.11.2017 14:31:00 15:08:00 2017-11-27 14:31:00 2017-11-27 15:08:00

注意:我将数据帧转换为tibble,以便您可以看到该类实际上是一个日期时间对象。

答案 2 :(得分:0)

您可以使用id y x1 x2 x3 1 10 435 435 435 2 11 438 438 438 创建索引,以找到greplDyStart有完整日期的位置:

DyEnd

然后你可以用正确的值替换其他值:

i1 <- !grepl('\\d{2}\\.\\d{2}\\.\\d{4} \\d{2}:\\d{2}', df$DyStart)
i2 <- !grepl('\\d{2}\\.\\d{2}\\.\\d{4} \\d{2}:\\d{2}', df$DyEnd)

接下来,您需要将df$DyStart[i1] <- paste(df$DyStart[i1], df$TmStart[i1]) df$DyEnd[i1] <- paste(df$DyEnd[i1], df$TmEnd[i1]) 粘贴到其他行:

:00

现在您可以将列转换为日期格式:

df$DyStart[!i1] <- paste0(df$DyStart[!i1], ':00')
df$DyEnd[!i2] <- paste0(df$DyEnd[!i2], ':00')

结果:

df[2:3] <- lapply(df[2:3], function(x) as.POSIXct(x, format = '%d.%m.%Y %H:%M:%S'))

答案 3 :(得分:0)

为了将示例的最后一行中的日期和时间分开以便将它们转换为日期和时间,您可以按照以下实例进行操作:

您的数据

data <- read.table(text=
"'ID' 'DyStart' 'DyEnd' 'TmStart' 'TmEnd'
'1' '04.12.2017' '05.12.2017' '10:10:00' '10:50:00'
'2' '01.12.2017' '01.12.2017' '12:27:00' '16:29:00'
'3' '27.11.2017' '27.11.2017' '14:31:00' '15:08:00'
'4' '07.12.2017 13:26' '07.12.2017 13:26' '' ''", stringsAsFactors=F, header=T) 

函数定义和使用以清除日期和时间:

fn_date <- function(columnDate){
columnDate <- ifelse(lapply(strsplit(columnDate, " "), length)==2,
                  unlist(strsplit(columnDate, " ")[lapply(strsplit(columnDate, " "), length)==2])[1],
                  columnDate)
return(columnDate)
}

fn_time <- function(columnDate, columnTime){
columnTime <- ifelse(lapply(strsplit(columnDate, " "), length)==2,
                     paste0(unlist(strsplit(columnDate, " ")[lapply(strsplit(columnDate, " "), length)==2])[2],":00"),
                     columnTime)
return(columnTime)
}

data$TmStart <- fn_time(data$DyStart, data$TmStart)
data$TmEnd <- fn_time(data$DyEnd, data$TmEnd)
data[,2:3] <- lapply(data[,2:3], fn_date)

lubridate

的帮助下将列转换为指定的格式
library(lubridate)
data[,2:3] <- lapply(data[,2:3], dmy)
data[,4:5] <- lapply(data[,4:5], hms)

结果是:

  ID    DyStart      DyEnd    TmStart      TmEnd
1  1 2017-12-04 2017-12-05 10H 10M 0S 10H 50M 0S
2  2 2017-12-01 2017-12-01 12H 27M 0S 16H 29M 0S
3  3 2017-11-27 2017-11-27 14H 31M 0S  15H 8M 0S
4  4 2017-12-07 2017-12-07 13H 26M 0S 13H 26M 0S

答案 4 :(得分:0)

我会写两个小函数来解决你的问题:

 #Function to extract time from the dates and merge it with the time column:
  Extract_Time=function(DATE,TIME){
   where=grep("\\s",DATE)
   DATE[where]=paste0(DATE[where],":00")#Am assuming all the data you have does not contain seconds
   ifelse(is.na(TIME),format(strptime(DATE,'%d.%m.%Y %H:%M:%S'),'%H:%M:%S'),TIME)
 }

 #Function for the date column:
 DATE=function(x)as.Date(x,'%d.%m.%Y')
 transform(dat1,DyStart=DATE(DyStart),
           DyEnd=DATE(DyEnd),
           TmStart=Extract_Time(DyStart,TmStart),
           TmEnd=Extract_Time(DyEnd,TmEnd))

  ID    DyStart      DyEnd  TmStart    TmEnd
1  1 2017-12-04 2017-12-04 10:10:00 10:50:00
2  2 2017-12-01 2017-12-01 12:27:00 16:29:00
3  3 2017-11-27 2017-11-27 14:31:00 15:08:00
4  4 2017-12-07 2017-12-07 13:26:00 13:26:00
5  5 2017-12-08 2017-12-08 15:26:00 16:26:00

使用数据:

dat1=read.table(text="ID  DyStart  DyEnd    TmStart  TmEnd
1   04.12.2017        04.12.2017        10:10:00    10:50:00
2   01.12.2017        01.12.2017        12:27:00    16:29:00
3   27.11.2017        27.11.2017        14:31:00    15:08:00
4   '07.12.2017 13:26' '07.12.2017 13:26' NA NA    
5   '08.12.2017 15:26' '08.12.2017 16:26' NA NA    ",h=T,stringsAsFactor=F)