我有一个较大的数据框,错误的条目在" DyStart"和" DyEnd" ....
ID DyStart DyEnd TmStart TmEnd
1 04.12.2017 04.12.2017 10:10:00 10:50:00
2 01.12.2017 01.12.2017 12:27:00 16:29:00
3 27.11.2017 27.11.2017 14:31:00 15:08:00
4 07.12.2017 13:26 07.12.2017 13:26
我希望在#34; DyStart"列中列出所有日期。和" DyEnd"并且所有时间都在" TmStart"和" TmEnd"。我没有日期问题......
df$DyStart <- format(as.POSIXct(df$Dyformat,format="%d.%m.%Y"),"%d.%m.%Y")
但我随着时间的推移而变得激动人心。我尝试创建一个新列并与旧列合并...
df$TmStartNew <- format(as.POSIXct(df$DyStart,format="%d.%m.%Y %H:%M"),"%H:%M:%S")
df$TmStart <- ifelse(is.na(df$TmStart), df$TmStartNew, df$TmStart)
我尝试了不同的东西,但我总是得到#34;数字&#34;或&#34;整数&#34;回来了,我无法将格式恢复为H:M:S
如果有人知道解决方案,我将非常感激!
答案 0 :(得分:2)
有很多方法可以实现它。但我更愿意按照OP中考虑的步骤来实现预期的结果。
似乎修改不按预期顺序执行,这导致OP出现问题。
让我用OP中使用的相同例子来解释。
s <- "ID, DyStart, DyEnd, TmStart, TmEnd
1, 04.12.2017, 04.12.2017, 10:10:00, 10:50:00
2, 01.12.2017, 01.12.2017, 12:27:00, 16:29:00
3, 27.11.2017, 27.11.2017, 14:31:00, 15:08:00
4, 07.12.2017 13:26, 07.12.2017 13:26"
#Create df as used in OP
df <- read.delim(textConnection(s), header = TRUE, sep = ",",
strip.white = TRUE, stringsAsFactors = FALSE)
#data looks as
> df
ID DyStart DyEnd TmStart TmEnd
1 1 04.12.2017 04.12.2017 10:10:00 10:50:00
2 2 01.12.2017 01.12.2017 12:27:00 16:29:00
3 3 27.11.2017 27.11.2017 14:31:00 15:08:00
4 4 07.12.2017 13:26 07.12.2017 13:26
#First create new columns with Time part from corresponding Date column
df$TmStartNew <- format(as.POSIXct(df$DyStart,format="%d.%m.%Y %H:%M"),"%H:%M:%S")
df$TmEndNew <- format(as.POSIXct(df$DyEnd,format="%d.%m.%Y %H:%M"),"%H:%M:%S")
#Replace valid values from new columns in original Tm columns
df$TmStart <- ifelse(!is.na(df$TmStartNew), df$TmStartNew, df$TmStart)
df$TmEnd <- ifelse(!is.na(df$TmEndNew), df$TmEndNew, df$TmEnd)
#Now modify Date columns to remove time part
df$DyStart <- format(as.POSIXct(df$DyStart,format="%d.%m.%Y"),"%d.%m.%Y")
df$DyEnd <- format(as.POSIXct(df$DyEnd,format="%d.%m.%Y"),"%d.%m.%Y")
#data frame will now contain
> df
ID DyStart DyEnd TmStart TmEnd TmStartNew TmEndNew
1 1 04.12.2017 04.12.2017 10:10:00 10:50:00 <NA> <NA>
2 2 01.12.2017 01.12.2017 12:27:00 16:29:00 <NA> <NA>
3 3 27.11.2017 27.11.2017 14:31:00 15:08:00 <NA> <NA>
4 4 07.12.2017 07.12.2017 13:26:00 13:26:00 13:26:00 13:26:00
现在可以放弃TmStartNew
&amp; TmEndNew
列。
如上所述,此解决方案已在OP的同一行中创建,但还有其他方法可以实现相同的结果。
答案 1 :(得分:1)
您可以使用mutate
包中的dplyr
添加新列和dmy_hms
包中的lubridate
,以便在粘贴日期和时间后将字符串转换为日期时间在一起。
看起来像这样:
library(dplyr)
library(lubridate)
df %>%
mutate(tm_start_new = lubridate::dmy_hms(paste(DyStart, TmStart)),
tm_end_new = lubridate::dmy_hms(paste(DyEnd, TmEnd)))
这会给你这个:
# A tibble: 3 x 7
ID DyStart DyEnd TmStart TmEnd tm_start_new tm_end_new
<fctr> <fctr> <fctr> <fctr> <fctr> <dttm> <dttm>
1 1 04.12.2017 04.12.2017 10:10:00 10:50:00 2017-12-04 10:10:00 2017-12-04 10:50:00
2 2 01.12.2017 01.12.2017 12:27:00 16:29:00 2017-12-01 12:27:00 2017-12-01 16:29:00
3 3 27.11.2017 27.11.2017 14:31:00 15:08:00 2017-11-27 14:31:00 2017-11-27 15:08:00
注意:我将数据帧转换为tibble,以便您可以看到该类实际上是一个日期时间对象。
答案 2 :(得分:0)
您可以使用id y x1 x2 x3
1 10 435 435 435
2 11 438 438 438
创建索引,以找到grepl
和DyStart
有完整日期的位置:
DyEnd
然后你可以用正确的值替换其他值:
i1 <- !grepl('\\d{2}\\.\\d{2}\\.\\d{4} \\d{2}:\\d{2}', df$DyStart)
i2 <- !grepl('\\d{2}\\.\\d{2}\\.\\d{4} \\d{2}:\\d{2}', df$DyEnd)
接下来,您需要将df$DyStart[i1] <- paste(df$DyStart[i1], df$TmStart[i1])
df$DyEnd[i1] <- paste(df$DyEnd[i1], df$TmEnd[i1])
粘贴到其他行:
:00
现在您可以将列转换为日期格式:
df$DyStart[!i1] <- paste0(df$DyStart[!i1], ':00')
df$DyEnd[!i2] <- paste0(df$DyEnd[!i2], ':00')
结果:
df[2:3] <- lapply(df[2:3], function(x) as.POSIXct(x, format = '%d.%m.%Y %H:%M:%S'))
答案 3 :(得分:0)
为了将示例的最后一行中的日期和时间分开以便将它们转换为日期和时间,您可以按照以下实例进行操作:
您的数据
data <- read.table(text=
"'ID' 'DyStart' 'DyEnd' 'TmStart' 'TmEnd'
'1' '04.12.2017' '05.12.2017' '10:10:00' '10:50:00'
'2' '01.12.2017' '01.12.2017' '12:27:00' '16:29:00'
'3' '27.11.2017' '27.11.2017' '14:31:00' '15:08:00'
'4' '07.12.2017 13:26' '07.12.2017 13:26' '' ''", stringsAsFactors=F, header=T)
函数定义和使用以清除日期和时间:
fn_date <- function(columnDate){
columnDate <- ifelse(lapply(strsplit(columnDate, " "), length)==2,
unlist(strsplit(columnDate, " ")[lapply(strsplit(columnDate, " "), length)==2])[1],
columnDate)
return(columnDate)
}
fn_time <- function(columnDate, columnTime){
columnTime <- ifelse(lapply(strsplit(columnDate, " "), length)==2,
paste0(unlist(strsplit(columnDate, " ")[lapply(strsplit(columnDate, " "), length)==2])[2],":00"),
columnTime)
return(columnTime)
}
data$TmStart <- fn_time(data$DyStart, data$TmStart)
data$TmEnd <- fn_time(data$DyEnd, data$TmEnd)
data[,2:3] <- lapply(data[,2:3], fn_date)
在lubridate
library(lubridate)
data[,2:3] <- lapply(data[,2:3], dmy)
data[,4:5] <- lapply(data[,4:5], hms)
结果是:
ID DyStart DyEnd TmStart TmEnd
1 1 2017-12-04 2017-12-05 10H 10M 0S 10H 50M 0S
2 2 2017-12-01 2017-12-01 12H 27M 0S 16H 29M 0S
3 3 2017-11-27 2017-11-27 14H 31M 0S 15H 8M 0S
4 4 2017-12-07 2017-12-07 13H 26M 0S 13H 26M 0S
答案 4 :(得分:0)
我会写两个小函数来解决你的问题:
#Function to extract time from the dates and merge it with the time column:
Extract_Time=function(DATE,TIME){
where=grep("\\s",DATE)
DATE[where]=paste0(DATE[where],":00")#Am assuming all the data you have does not contain seconds
ifelse(is.na(TIME),format(strptime(DATE,'%d.%m.%Y %H:%M:%S'),'%H:%M:%S'),TIME)
}
#Function for the date column:
DATE=function(x)as.Date(x,'%d.%m.%Y')
transform(dat1,DyStart=DATE(DyStart),
DyEnd=DATE(DyEnd),
TmStart=Extract_Time(DyStart,TmStart),
TmEnd=Extract_Time(DyEnd,TmEnd))
ID DyStart DyEnd TmStart TmEnd
1 1 2017-12-04 2017-12-04 10:10:00 10:50:00
2 2 2017-12-01 2017-12-01 12:27:00 16:29:00
3 3 2017-11-27 2017-11-27 14:31:00 15:08:00
4 4 2017-12-07 2017-12-07 13:26:00 13:26:00
5 5 2017-12-08 2017-12-08 15:26:00 16:26:00
使用数据:
dat1=read.table(text="ID DyStart DyEnd TmStart TmEnd
1 04.12.2017 04.12.2017 10:10:00 10:50:00
2 01.12.2017 01.12.2017 12:27:00 16:29:00
3 27.11.2017 27.11.2017 14:31:00 15:08:00
4 '07.12.2017 13:26' '07.12.2017 13:26' NA NA
5 '08.12.2017 15:26' '08.12.2017 16:26' NA NA ",h=T,stringsAsFactor=F)