聚合功能在R中无法正常工作

时间:2018-08-28 17:15:36

标签: r aggregate average

我正在尝试使用聚合函数将100 Hz的数据转换为1分钟的平均值。但是,当我使用此功能时,1分钟的平均值不正确。数据示例如下。我正在使用以下代码来计算1分钟值。代码没有中断,但是计算不正确。

aggregate(list(X = df$`Gyroscope X`,
                   Y = df$`Gyroscope Y`,
                        Z = df$`Gyroscope Z`),
          list(minofday = cut(df$Timestamp, "1 min")),mean)

                  Timestamp Gyroscope X Gyroscope Y Gyroscope Z
2018-07-10T10:25:00.0000000 41.381838   -21.667482  -118.896492
2018-07-10T10:25:00.0100000 48.046268   -12.399903  -110.917976
2018-07-10T10:25:00.0200000 49.102786   -7.36084    -106.485602
2018-07-10T10:25:00.0300000 44.338382   -9.215699   -102.296759
2018-07-10T10:25:00.0400000 34.724123   -11.308594  -96.108404
2018-07-10T10:25:00.0500000 19.622804   -15.225221  -88.122564
2018-07-10T10:25:00.0600000 13.240968   -26.539308  -85.274663
2018-07-10T10:25:00.0700000 13.397218   -31.933596  -80.127568
2018-07-10T10:25:00.0800000 16.333009   -29.663088  -73.027348
2018-07-10T10:25:00.0900000 17.384645   -29.745485  -67.694096
2018-07-10T10:25:00.1000000 16.546632   -30.08423   -67.565922

3 个答案:

答案 0 :(得分:1)

假设OP的数据随分钟变化(请注意修改后的数据),这是如何使用基数R和dplyr进行操作:

df$Timestamp <- as.POSIXct(df$Timestamp, format = "%Y-%m-%dT%H:%M:%S")

aggregate(list(X = df$Gyroscope_X,
               Y = df$Gyroscope_Y,
               Z = df$Gyroscope_Z),
          list(minofday = cut(df$Timestamp, "1 min")), mean)

或更简洁的方式:

aggregate(. ~ minofday, mean, data = cbind(setNames(df[,-1], c("X", "Y", "Z")), 
                                           minofday = cut(df$Timestamp, "1 min")))

结果:

             minofday        X          Y          Z
1 2018-07-10 10:24:00 48.57453  -9.880371 -108.70179
2 2018-07-10 10:25:00 27.78422 -19.314983  -95.13774
3 2018-07-10 10:26:00 16.85883 -29.704286  -70.36072
4 2018-07-10 10:27:00 16.54663 -30.084230  -67.56592

使用lubridate中的summarize_alldplyr

library(dplyr)
library(lubridate)

df %>%
  mutate(Timestamp = ymd_hms(Timestamp)) %>%
  group_by(minofday = cut(Timestamp, "1 min")) %>%
  summarize_all(mean) %>%
  select(-Timestamp)

结果:

# A tibble: 4 x 4
  minofday            Gyroscope_X Gyroscope_Y Gyroscope_Z
  <fct>                     <dbl>       <dbl>       <dbl>
1 2018-07-10 10:24:00        48.6       -9.88      -109. 
2 2018-07-10 10:25:00        27.8      -19.3        -95.1
3 2018-07-10 10:26:00        16.9      -29.7        -70.4
4 2018-07-10 10:27:00        16.5      -30.1        -67.6

数据:

df <- read.table(text = " Timestamp  Gyroscope_X Gyroscope_Y Gyroscope_Z
2018-07-10T10:25:00.0000000 41.381838   -21.667482  -118.896492
                 2018-07-10T10:24:00.0100000 48.046268   -12.399903  -110.917976
                 2018-07-10T10:24:00.0200000 49.102786   -7.36084    -106.485602
                 2018-07-10T10:25:00.0300000 44.338382   -9.215699   -102.296759
                 2018-07-10T10:25:00.0400000 34.724123   -11.308594  -96.108404
                 2018-07-10T10:25:00.0500000 19.622804   -15.225221  -88.122564
                 2018-07-10T10:25:00.0600000 13.240968   -26.539308  -85.274663
                 2018-07-10T10:25:00.0700000 13.397218   -31.933596  -80.127568
                 2018-07-10T10:26:00.0800000 16.333009   -29.663088  -73.027348
                 2018-07-10T10:26:00.0900000 17.384645   -29.745485  -67.694096
                 2018-07-10T10:27:00.1000000 16.546632   -30.08423   -67.565922", header = TRUE)

答案 1 :(得分:0)

以下是使用tidyverse的lubridate和dplyr软件包的解决方案:

library(dplyr)
library(lubridate)
df %>%
  mutate(day = day(Timestamp),
         hour = hour(Timestamp),
         min = minute(Timestamp)) %>%
  group_by(day, hour, min) %>%
  summarise(
    `Gyroscope X` = mean(`Gyroscope X`),
    `Gyroscope Y` = mean(`Gyroscope Y`),
    `Gyroscope Z` = mean(`Gyroscope Z`)
  )

答案 2 :(得分:0)

由于您正在处理时间戳,因此xts软件包具有很多可以帮助您的功能。要汇总时间戳记,period.apply可以为您提供帮助。 endpoints部分可以将数据从微秒累积到数年。

# don't load the timestamp column that one goes to the order.by part
df1_xts <- xts(df1[, -1], order.by = df1$Timestamp)

# roll up to seconds.
period.apply(df1_xts, endpoints(df1_xts, on = "mins"), colMeans)

                    Gyroscope_X Gyroscope_Y Gyroscope_Z
2018-07-10 10:25:00    28.55624   -20.46759   -90.59249

如果您的时间戳列还不是日期时间对象,则可以使用以下方法:

df1$Timestamp <- strptime(df1$Timestamp, format = "%Y-%m-%dT%H:%M:%OS")

数据:

df1 <- structure(list(Timestamp = structure(list(sec = c(0, 0.01, 0.02, 
0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1), min = c(25L, 
25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L, 25L), hour = c(10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), mday = c(10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), mon = c(6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), year = c(118L, 118L, 
118L, 118L, 118L, 118L, 118L, 118L, 118L, 118L, 118L), wday = c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), yday = c(190L, 190L, 
190L, 190L, 190L, 190L, 190L, 190L, 190L, 190L, 190L), isdst = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), zone = c("CEST", "CEST", 
"CEST", "CEST", "CEST", "CEST", "CEST", "CEST", "CEST", "CEST", 
"CEST"), gmtoff = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_)), class = c("POSIXlt", "POSIXt")), 
    Gyroscope_X = c(41.381838, 48.046268, 49.102786, 44.338382, 
    34.724123, 19.622804, 13.240968, 13.397218, 16.333009, 17.384645, 
    16.546632), Gyroscope_Y = c(-21.667482, -12.399903, -7.36084, 
    -9.215699, -11.308594, -15.225221, -26.539308, -31.933596, 
    -29.663088, -29.745485, -30.08423), Gyroscope_Z = c(-118.896492, 
    -110.917976, -106.485602, -102.296759, -96.108404, -88.122564, 
    -85.274663, -80.127568, -73.027348, -67.694096, -67.565922
    )), row.names = c(NA, -11L), class = "data.frame")