我正在从医疗记录平台导出数据。
数据看起来像这样......
Date.time TEMP HR RR SBP DBP
1 Jun-08-2015
2 1323 36.8 O – – – –
3 931 36.8 O 76 MC 22 SP 104 MC 52 MC
4 930 – – – – –
5 929 – – – – –
6 813 36.8 O 76 MC 22 SP 104 MC 52 MC
7 126 36.3 O 78 MC 23 SP 112 MC 55 MC
8 40 36.3 O 78 MC 23 SP 112 MC 55 MC
9 Jun-07-2015
10 2307 36 O 71 MC 22 SP 120 MC 57 MC
我需要能够在单个列上显示日期和时间,但采用以下格式yyyymmddhhmm
1323 931 930 929 etc
对应时间
我的预期输出是......
Date.time TEMP HR RR SBP DBP
1 201506081323 36.8 O – – – –
2 201506080931 36.8 O 76 MC 22 SP 104 MC 52 MC
3 201506080930 – – – – –
4 201506080929 – – – – –
5 201506080813 36.8 O 76 MC 22 SP 104 MC 52 MC
6 201506080126 36.3 O 78 MC 23 SP 112 MC 55 MC
7 201506080040 36.3 O 78 MC 23 SP 112 MC 55 MC
8 201506072307 36 O 71 MC 22 SP 120 MC 57 MC
答案 0 :(得分:1)
将日期分为日期和时间,填写缺失日期,然后粘贴日期和时间,转换为日期类。
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
mutate(x1 = if_else(nchar(Date.time) > 4, Date.time, NA_character_),
x2 = if_else(nchar(Date.time) > 4, NA_character_, Date.time),
x2 = str_pad(x2, width = 4, side = "left", pad = "0")) %>%
fill(x1) %>%
filter(!is.na(x2)) %>%
mutate(Date.time.v1 = as.POSIXct(paste(x1, x2), format = "%b-%d-%Y %H%M")) %>%
select(-c(x1, x2))
# Date.time TEMP HR RR SBP DBP Date.time.v1
# 1 1323 36.8 O - - - - 2015-06-08 13:23:00
# 2 931 36.8 O 76 MC 22 SP 104 MC 52 MC 2015-06-08 09:31:00
# 3 930 - - - - - 2015-06-08 09:30:00
# 4 929 - - - - - 2015-06-08 09:29:00
# 5 813 36.8 O 76 MC 22 SP 104 MC 52 MC 2015-06-08 08:13:00
# 6 126 36.3 O 78 MC 23 SP 112 MC 55 MC 2015-06-08 01:26:00
# 7 40 36.3 O 78 MC 23 SP 112 MC 55 MC 2015-06-08 00:40:00
# 8 2307 36 O 71 MC 22 SP 120 MC 57 MC 2015-06-07 23:07:00
df1 <- read.table(text = "
Date.time TEMP HR RR SBP DBP
Jun-08-2015
1323 36.8 O - - - -
931 36.8 O 76 MC 22 SP 104 MC 52 MC
930 - - - - -
929 - - - - -
813 36.8 O 76 MC 22 SP 104 MC 52 MC
126 36.3 O 78 MC 23 SP 112 MC 55 MC
40 36.3 O 78 MC 23 SP 112 MC 55 MC
Jun-07-2015
2307 36 O 71 MC 22 SP 120 MC 57 MC
", header = TRUE, sep = "\t", stringsAsFactor = FALSE)
答案 1 :(得分:1)
这是我提出的,但仍然必须返回EXCEL中的文件以将日期与时间分开。这并不需要很长时间(可能是1分钟)。我计划使用的所有文件长度大致相同,所以这不是什么大问题。
在这样做之后我最终得到了这样的文件......
X Date.time TEMP HR RR SBP DBP
1 NA
2 Jun-08-2015 1323 36.8 O – – – –
3 Jun-08-2015 931 36.8 O 76 MC 22 SP 104 MC 52 MC
4 Jun-08-2015 930 – – – – –
5 Jun-08-2015 929 – – – – –
6 Jun-08-2015 813 36.8 O 76 MC 22 SP 104 MC 52 MC
7 Jun-08-2015 126 36.3 O 78 MC 23 SP 112 MC 55 MC
8 Jun-08-2015 40 36.3 O 78 MC 23 SP 112 MC 55 MC
9 NA
10 Jun-07-2015 2307 36 O 71 MC 22 SP 120 MC 57 MC
之后我使用了以下代码。很抱歉,我需要提供所有评论,使代码尽可能易于理解,以便我实验室中的每个人都能理解正在进行的操作。
#eliminate empty rows
SJ <- na.omit(SJ)
#Convert month to number
SJ$newdate <- strptime(as.character(SJ$X), "%b-%d-%Y")
#Eliminate dashes from date
SJ$newdate <- gsub("[[:punct:]]","",SJ$newdate)
#Add column with "0000" for later use in proper date conversion
SJ$zeros <- rep("0000",nrow(SJ))
#Combine date column with zeros column to obtain date number of correct length
SJ$date = paste(SJ$newdate, SJ$zeros, sep="")
#convert date column to number
SJ$Date.time <- as.numeric(SJ$Date.time)
#Convert time column to number
SJ$date <- as.numeric(SJ$date)
#Add time column to date column resulting in desired datetime format. Saves as vector.
Datetime <- SJ$date + SJ$Date.time
#Inserts Datetime column as first column
SJ <- cbind(Datetime,SJ)
该文件现在看起来像这样。
Datetime X Date.time TEMP HR RR SBP DBP newdate zeros date
2 201506081323 Jun-08-2015 1323 36.8 O – – – – 20150608 0000 201506080000
3 201506080931 Jun-08-2015 931 36.8 O 76 MC 22 SP 104 MC 52 MC 20150608 0000 201506080000
4 201506080930 Jun-08-2015 930 – – – – – 20150608 0000 201506080000
5 201506080929 Jun-08-2015 929 – – – – – 20150608 0000 201506080000
6 201506080813 Jun-08-2015 813 36.8 O 76 MC 22 SP 104 MC 52 MC 20150608 0000 201506080000
7 201506080126 Jun-08-2015 126 36.3 O 78 MC 23 SP 112 MC 55 MC 20150608 0000 201506080000
8 201506080040 Jun-08-2015 40 36.3 O 78 MC 23 SP 112 MC 55 MC 20150608 0000 201506080000
10 201506072307 Jun-07-2015 2307 36 O 71 MC 22 SP 120 MC 57 MC 20150607 0000 201506070000
最后,我只是删除了不必要的列。 X , Date.time , newdate , zeros , date
谢谢大家的帮助!