有一个全年的数据库:
Month Day Time X Y
...
3 1 0 2 4
3 1 1 4 2
3 1 2 7 3
3 1 3 8 8
3 1 4 4 6
3 1 5 1 4
3 1 6 6 6
3 1 7 7 9
...
3 2 0 5 7
3 2 1 7 2
3 2 2 9 3
...
4 1 0 2 8
...
我想找到每一天的X最大值,并为每一天创建一个图,从一天的开始(时间0)开始,直到找到的最大值。我尝试使用数据框,但有点迷茫,数据库很大,所以我不确定这是否是最好的主意。
有什么想法怎么做?
答案 0 :(得分:0)
如果我对您的理解正确,那应该可以:
样本数据集:
set.seed(123)
df <- data.frame(Month = sample(c(1:12), 30, replace = TRUE),
Day = sample(c(1:31), 30, replace = TRUE),
Time = sample(c(1:24), 30, replace = TRUE),
x = rnorm(30, mean = 10, sd = 5),
y = rnorm(30, mean = 10, sd = 5))
使用tidyverse
(ggplot
和dplyr
):
require(tidyverse)
df %>%
#Grouping by month and day
group_by(Month, Day) %>%
#Creating new variables for x and y - the max value, and removing values bigger than the max value.
mutate(maxX = max(x, na.rm = TRUE),
maxY = max(y, na.rm = TRUE),
plotX = ifelse(x > maxY, NA, x),
plotY = ifelse(y > maxY, NA, y)) %>%
ungroup() %>%
#Select and gather only the needed variables for the plot
select(Time, plotX, plotY) %>%
gather(plot, value, -Time) %>%
#Plot
ggplot(aes(Time, value, color = plot)) +
geom_point()
输出:
答案 1 :(得分:0)
您可以尝试使用tidyverse
。每天和每月的重复时间被删除,没有任何排名。
library(tidyverse)
set.seed(123)
df <- data.frame(Month = sample(c(1:2), 30, replace = TRUE),
Day = sample(c(1:2), 30, replace = TRUE),
Time = sample(c(1:10), 30, replace = TRUE),
x = rnorm(30, mean = 10, sd = 5),
y = rnorm(30, mean = 10, sd = 5))
df %>%
group_by(Month, Day) %>%
filter(!duplicated(Time)) %>% # remove dupliceted "Time"'s.
filter(x<=max(x) & Time <= Time[x == max(x)]) %>%
ggplot(aes(Time, x)) +
geom_line() +
geom_point(data=. %>% filter(x == max(x)))+
facet_grid(Month~Day, labeller = label_both)
或者尝试使用不同的颜色将它们全部放在一个图中
df %>%
group_by(Month, Day) %>%
filter(!duplicated(Time)) %>%
filter(x<=max(x) & Time <= Time[x == max(x)]) %>%
ggplot(aes(Time, x, color = interaction(Month, Day))) +
geom_line() +
geom_point(data=. %>% filter(x == max(x)))