Question

我有一个巨大的数据帧，包含10个mio对象，在RStudio中具有以下格式。

       ID      DATE        reading
    100845 2014-08-17 0,0,0,0,3,0,0,0,0,1,1,0,0,0,0,0,2,0,0,1,0,0,2,0
    100845 2014-08-18 0,0,4,0,0,0,0,1,0,0,0,0,1,1,1,1,0,1,1,2,1,1,0,1
    100845 2014-08-19 0,1,0,1,0,1,1,1,2,0,1,0,1,0,1,0,1,0,1,0,2,1,1,0
    100918 2015-07-02 1,0,0,1,0,1,3,1,1,0,1,0,1,0,1,0,0,1,0,1,0,0,1,1
    100920 2013-02-07 0,1,0,0,1,0,1,1,1,0,0,1,0,0,1,0,0,5,6,4,2,1,0,1
    100920 2013-02-08 0,1,0,0,1,3,5,4,2,1,0,1,0,1,0,0,1,3,7,5,1,1,1,0

每行24个读数是指一天中每小时的读数。我想将每日日期转换为每小时，并将读数字符串转换为列格式。 ID应遵循此格式。例如，我已实现以下内容：

hourly <- data.frame(Hourly=seq(min(as.POSIXct(paste0(df$date, "00:00"),tz="")),max(as.POSIXct(paste0(df$date, "23:00"),tz="")),by="hour"))

如何使用与每日格式相同的ID填写由于每小时设置而创建的新字段？由于我拥有的完整数据集非常大，我将非常感谢能够以非常快的速度运行的解决方案。

Answer 1

我无法在与您一样大的数据集上谈论此方法的速度，但我认为此代码可以执行您想要的步骤：

library(dplyr)
library(tidyr)

df2 <- df %>%
  # use separate to spread the readings across separate columns
  separate(reading, into = paste0("hour.", seq(24)), sep = ",") %>%
  # use gather to convert that wide data frame into a long one
  gather(key = hour, value = reading, hour.1:hour.24) %>%
  # make the hour marker into a number
  mutate(hour = as.numeric(gsub("hour.", "", hour)) - 1) %>%
  # order the data
  arrange(ID, DATE, hour) %>%
  # create a new column that combines the date and time stamp
  mutate(datetime = as.POSIXct(paste(DATE, hour), format = "%Y-%m-%d %H")) %>%
  # shed unneeded columns
  select(ID, datetime, reading)

结果：

> head(df2)
      ID            datetime reading
1 100845 2014-08-17 00:00:00       0
2 100845 2014-08-17 01:00:00       0
3 100845 2014-08-17 02:00:00       0
4 100845 2014-08-17 03:00:00       0
5 100845 2014-08-17 04:00:00       3
6 100845 2014-08-17 05:00:00       0

如何在另一列格式化后扩展数据框中的列？

1 个答案: