我有一个巨大的数据帧,包含10个mio对象,在RStudio中具有以下格式。
ID DATE reading
100845 2014-08-17 0,0,0,0,3,0,0,0,0,1,1,0,0,0,0,0,2,0,0,1,0,0,2,0
100845 2014-08-18 0,0,4,0,0,0,0,1,0,0,0,0,1,1,1,1,0,1,1,2,1,1,0,1
100845 2014-08-19 0,1,0,1,0,1,1,1,2,0,1,0,1,0,1,0,1,0,1,0,2,1,1,0
100918 2015-07-02 1,0,0,1,0,1,3,1,1,0,1,0,1,0,1,0,0,1,0,1,0,0,1,1
100920 2013-02-07 0,1,0,0,1,0,1,1,1,0,0,1,0,0,1,0,0,5,6,4,2,1,0,1
100920 2013-02-08 0,1,0,0,1,3,5,4,2,1,0,1,0,1,0,0,1,3,7,5,1,1,1,0
每行24个读数是指一天中每小时的读数。我想将每日日期转换为每小时,并将读数字符串转换为列格式。 ID应遵循此格式。 例如,我已实现以下内容:
hourly <- data.frame(Hourly=seq(min(as.POSIXct(paste0(df$date, "00:00"),tz="")),max(as.POSIXct(paste0(df$date, "23:00"),tz="")),by="hour"))
如何使用与每日格式相同的ID填写由于每小时设置而创建的新字段?由于我拥有的完整数据集非常大,我将非常感谢能够以非常快的速度运行的解决方案。
答案 0 :(得分:0)
我无法在与您一样大的数据集上谈论此方法的速度,但我认为此代码可以执行您想要的步骤:
library(dplyr)
library(tidyr)
df2 <- df %>%
# use separate to spread the readings across separate columns
separate(reading, into = paste0("hour.", seq(24)), sep = ",") %>%
# use gather to convert that wide data frame into a long one
gather(key = hour, value = reading, hour.1:hour.24) %>%
# make the hour marker into a number
mutate(hour = as.numeric(gsub("hour.", "", hour)) - 1) %>%
# order the data
arrange(ID, DATE, hour) %>%
# create a new column that combines the date and time stamp
mutate(datetime = as.POSIXct(paste(DATE, hour), format = "%Y-%m-%d %H")) %>%
# shed unneeded columns
select(ID, datetime, reading)
结果:
> head(df2)
ID datetime reading
1 100845 2014-08-17 00:00:00 0
2 100845 2014-08-17 01:00:00 0
3 100845 2014-08-17 02:00:00 0
4 100845 2014-08-17 03:00:00 0
5 100845 2014-08-17 04:00:00 3
6 100845 2014-08-17 05:00:00 0