r绘制时间序列,用于总结多个变量

时间:2018-05-22 04:42:35

标签: r plot ggplot2 time-series

这是我第一次尝试使用时间序列图。我有一个大约有50k行的数据集,其结构与下面相似多年;

Year    expense_1   expense_2   expense_3   expense_4
1999    5           NA          NA          31.82
2000    2           NA          NA          4.75
1999    10.49       NA          NA          NA
2000    39.69       NA          NA          NA
2000    NA          NA          10.61       NA
1999    8.08        NA          NA          NA
2000    16          NA          NA          NA
1999    9.32        NA          NA          NA
1999    9.35        NA          NA          NA

现在,我想在{strong> X轴上绘制Year的时间序列,在 Y轴上绘制Expense的时间序列expense_1expense_2expense_3expense_4各有不同的行。每个类别的费用应按年度汇总,NA应删除。

2 个答案:

答案 0 :(得分:2)

您可以使用sum计算summarise_all,然后将数据转换为长格式,以便使用ggplot

更容易进行绘图
library(tidyverse)
library(scales)

df <- read.table(text = "Year    expense_1   expense_2   expense_3   expense_4
1999    5           NA          NA          31.82
                 2000    2           NA          NA          4.75
                 1999    10.49       NA          NA          NA
                 2000    39.69       NA          NA          NA
                 2000    NA          NA          10.61       NA
                 1999    8.08        NA          NA          NA
                 2000    16          NA          NA          NA
                 1999    9.32        NA          NA          NA
                 1999    9.35        NA          NA          NA",
                 header = TRUE, stringsAsFactors = FALSE)

# define summation function that returns NA if all values are NA
# By default, R returns 0 if all values are NA
sum_NA <- function(x) {
  if(all(is.na(x))) NA_integer_ else sum(x, na.rm = TRUE)
} 

df_long <- df %>% 
  group_by(Year) %>% 
  summarise_all(funs(sum_NA(.))) %>% 
  gather(key = "type", value = "expense", -Year)
df_long

#> # A tibble: 8 x 3
#>    Year type      expense
#>   <int> <chr>       <dbl>
#> 1  1999 expense_1   42.2 
#> 2  2000 expense_1   57.7 
#> 3  1999 expense_2   NA   
#> 4  2000 expense_2   NA   
#> 5  1999 expense_3   NA   
#> 6  2000 expense_3   10.6 
#> 7  1999 expense_4   31.8 
#> 8  2000 expense_4    4.75

ggplot(df_long, aes(x = Year, y = expense, color = type, group = type)) +
  geom_point() +
  geom_line() +
  scale_x_continuous(breaks = scales::pretty_breaks(n = 1)) +
  theme_bw()

reprex package(v0.2.0)创建于2018-05-21。

答案 1 :(得分:0)

您可以让ggplot为您完成大部分工作 - 只需gather,然后开始绘图:

df %>%
  gather(expense, value, -Year) %>%
  ggplot(aes(x=Year, y=value, color=expense)) +
  geom_line(stat="summary", fun.y="sum")