Question

有一个全年的数据库：

Month Day Time  X   Y
...
3      1    0   2   4
3      1    1   4   2
3      1    2   7   3
3      1    3   8   8
3      1    4   4   6
3      1    5   1   4
3      1    6   6   6
3      1    7   7   9
...
3      2    0   5   7
3      2    1   7   2
3      2    2   9   3
...
4      1    0   2   8
...

我想找到每一天的X最大值，并为每一天创建一个图，从一天的开始（时间0）开始，直到找到的最大值。我尝试使用数据框，但有点迷茫，数据库很大，所以我不确定这是否是最好的主意。

有什么想法怎么做？

Answer 1

如果我对您的理解正确，那应该可以：

样本数据集：

set.seed(123)
df <- data.frame(Month = sample(c(1:12), 30, replace = TRUE), 
                 Day = sample(c(1:31), 30, replace = TRUE), 
                 Time = sample(c(1:24), 30, replace = TRUE),
                 x = rnorm(30, mean = 10, sd = 5),
                 y = rnorm(30, mean = 10, sd = 5))

使用tidyverse（ggplot和dplyr）：

require(tidyverse)
df %>% 
  #Grouping by month and day
  group_by(Month, Day) %>% 
  #Creating new variables for x and y - the max value, and removing values bigger than the max value. 
  mutate(maxX = max(x, na.rm = TRUE), 
         maxY = max(y, na.rm = TRUE), 
         plotX = ifelse(x > maxY, NA, x), 
         plotY = ifelse(y > maxY, NA, y)) %>% 
  ungroup() %>%
  #Select and gather only the needed variables for the plot
  select(Time, plotX, plotY) %>% 
  gather(plot, value, -Time) %>%
  #Plot
  ggplot(aes(Time, value, color = plot)) + 
  geom_point()

输出：

Answer 2

您可以尝试使用tidyverse。每天和每月的重复时间被删除，没有任何排名。

library(tidyverse)
set.seed(123)
df <- data.frame(Month = sample(c(1:2), 30, replace = TRUE), 
                 Day = sample(c(1:2), 30, replace = TRUE), 
                 Time = sample(c(1:10), 30, replace = TRUE),
                 x = rnorm(30, mean = 10, sd = 5),
                 y = rnorm(30, mean = 10, sd = 5))

df %>%
  group_by(Month, Day) %>%
  filter(!duplicated(Time)) %>%  # remove dupliceted "Time"'s.  
  filter(x<=max(x) & Time <= Time[x == max(x)]) %>% 
  ggplot(aes(Time, x)) + 
   geom_line() + 
   geom_point(data=. %>% filter(x == max(x)))+ 
   facet_grid(Month~Day, labeller = label_both)

或者尝试使用不同的颜色将它们全部放在一个图中

df %>%
  group_by(Month, Day) %>%
  filter(!duplicated(Time)) %>% 
  filter(x<=max(x) & Time <= Time[x == max(x)]) %>% 
  ggplot(aes(Time, x, color = interaction(Month, Day))) + 
   geom_line() + 
   geom_point(data=. %>% filter(x == max(x)))

在一年中的每一天从列中找到最大数目，并在R中创建一个达到该数目的图

2 个答案: