汇总时间序列数据

时间:2019-01-02 19:39:03

标签: r time-series

我认为这可能真的很容易,但是我对R不满意。
我有一个数据文件,它是两列,即“日期和时间”,已将其转换为R中的日期和时间,并已将其转换为葡萄糖列(以下示例)。每5分钟提供一次数据,我正尝试获取24小时的平均值,然后从11 pm-6am和6 am-11pm。
我无法弄清楚如何编写代码来获取此数据。我尝试使用apply.daily语法来获得24小时平均值,但它给了我一个错误。

数据样本:

Datetime            Glucose
2018-03-07 23:01:04 154
2018-03-07 23:06:04 235
2018-03-07 23:11:04 232
2018-03-07 23:16:04 144
2018-03-07 23:21:04 134
2018-03-07 23:26:04 107
2018-03-07 23:31:04 108
2018-03-07 23:36:04 122
2018-03-07 23:41:04 143
2018-03-07 23:46:04 113
2018-03-07 23:51:04 115
2018-03-07 23:56:04 116
2018-03-08 00:01:04 117
2018-03-08 00:06:04 117
2018-03-08 00:11:04 114
2018-03-08 00:16:04 109

2 个答案:

答案 0 :(得分:0)

data.table方法(带有自定义示例数据)

您可能必须更改定义周期的代码,因为您(我想?)希望周期23-06在第二天使用23直到06。...

样本数据

library( data.table )

#create sample data
dt <- fread("Datetime            Glucose
2018-03-07T22:01:04 154
2018-03-07T22:06:04 235
2018-03-07T22:11:04 232
2018-03-07T23:16:04 144
2018-03-07T23:21:04 134
2018-03-07T3:26:04 107
2018-03-07T23:31:04 108
2018-03-07T23:36:04 122
2018-03-07T23:41:04 143
2018-03-07T23:46:04 113
2018-03-07T23:51:04 115
2018-03-07T23:56:04 116
2018-03-08T00:01:04 117
2018-03-08T00:06:04 117
2018-03-08T00:11:04 114
2018-03-08T00:16:04 109", header = TRUE)
dt[ , Datetime := as.POSIXct( Datetime, format = "%Y-%m-%dT%H:%M:%S" ) ]

代码

#create period 6-23 and 23-6
dt[ , period := ifelse( hour( Datetime ) >= 23 | hour( Datetime ) < 6 , "eleven-six", "six-eleven" )]

#daily mean
dt[, .( mean.Glucose = mean( Glucose) ), by = .( day = as.Date( Datetime, tz = "" ) ) ][]
#           day mean.Glucose
# 1: 2018-03-07     143.5833
# 2: 2018-03-08     114.2500

#mean per period
dt[, .( mean.Glucose = mean( Glucose) ), by = .( day = as.Date( Datetime, tz = "" ), period ) ][]
#           day     period mean.Glucose
# 1: 2018-03-07 six-eleven     207.0000
# 2: 2018-03-07 eleven-six     122.4444
# 3: 2018-03-08 eleven-six     114.2500

答案 1 :(得分:0)

您想研究lubridate软件包。这是将tidyverse用于各种项目的lubridate方法。

  1. 使用ymd_hms转换为时间。
  2. 使用dayhour创建分组类别以进行总结。
library(tidyverse)
library(lubridate)

df <- tribble(~date_time, ~glucose,
"2018-03-07 23:01:04",             154,
"2018-03-07 23:06:04",             235,
"2018-03-07 23:11:04",             232,
"2018-03-07 23:16:04",             144,
"2018-03-07 23:21:04",             134,
"2018-03-07 23:26:04",             107,
"2018-03-07 23:31:04",             108,
"2018-03-07 23:36:04",             122,
"2018-03-07 23:41:04",             143,
"2018-03-07 23:46:04",             113,
"2018-03-07 23:51:04",             115,
"2018-03-07 23:56:04",             116,
"2018-03-08 00:01:04",             117,
"2018-03-08 00:06:04",             117,
"2018-03-08 00:11:04",             114,
"2018-03-08 00:16:04",             109)


## Get daily average glucose
df %>% 
  mutate(date_time = ymd_hms(date_time),
         day = day(date_time)) %>% 
  group_by(day) %>% 
  summarize(mean_glucose = mean(glucose))

#> # A tibble: 2 x 2
#>     day mean_glucose
#>   <int>        <dbl>
#> 1     7         144.
#> 2     8         114.

## Get 11pm-6am and 6am-11pm averages
df %>% 
  mutate(date_time = ymd_hms(date_time),
         hour = hour(date_time),
         range = if_else(between(hour, 06, 23), "6am - 11pm", "11pm - 6am")) %>% 
  group_by(range) %>% 
  summarize(mean_glucose = mean(glucose))

#> # A tibble: 2 x 2
#>   range      mean_glucose
#>   <chr>             <dbl>
#> 1 11pm - 6am         114.
#> 2 6am - 11pm         144.

reprex package(v0.2.1)于2019-01-02创建