Question

我将分析数据作为table with a timestamp and some data存储在MySQL数据库中，并希望在管理控制台上显示此数据（通过计算条目数）下采样（即在一个时间范围内对其进行分组），我想知道选择数据是否更有效，并使用R脚本对其进行下采样，或者是否更好地使用

GROUP BY UNIX_TIMESTAMP(timestamp) DIV <some time>

并在数据库层上执行此操作。任何其他提示也将不胜感激。

Answer 1

如果您可以使用dplyr，则可以使用以下内容执行此操作：

library(dplyr)

yay <- 
  # Specify username and password in my.cnf
  src_mysql(host = "blah.com") %>%
  tbl("some_table") %>%
  # You will need to compute a grouping variable
  mutate(group = unix_timestamp(timestamp)) %>%
  group_by(group) %>%
  # This will return the number of rows in each group
  summarise(n = n()) %>%
  # This will execute the query and return a data.frame
  collect

对MySQL或R中的分析数据进行下采样

1 个答案: