这是我第一次尝试使用时间序列图。我有一个大约有50k行的数据集,其结构与下面相似多年;
Year expense_1 expense_2 expense_3 expense_4
1999 5 NA NA 31.82
2000 2 NA NA 4.75
1999 10.49 NA NA NA
2000 39.69 NA NA NA
2000 NA NA 10.61 NA
1999 8.08 NA NA NA
2000 16 NA NA NA
1999 9.32 NA NA NA
1999 9.35 NA NA NA
现在,我想在{strong> X轴上绘制Year
的时间序列,在 Y轴上绘制Expense
的时间序列expense_1
,expense_2
,expense_3
,expense_4
各有不同的行。每个类别的费用应按年度汇总,NA
应删除。
答案 0 :(得分:2)
您可以使用sum
计算summarise_all
,然后将数据转换为长格式,以便使用ggplot
library(tidyverse)
library(scales)
df <- read.table(text = "Year expense_1 expense_2 expense_3 expense_4
1999 5 NA NA 31.82
2000 2 NA NA 4.75
1999 10.49 NA NA NA
2000 39.69 NA NA NA
2000 NA NA 10.61 NA
1999 8.08 NA NA NA
2000 16 NA NA NA
1999 9.32 NA NA NA
1999 9.35 NA NA NA",
header = TRUE, stringsAsFactors = FALSE)
# define summation function that returns NA if all values are NA
# By default, R returns 0 if all values are NA
sum_NA <- function(x) {
if(all(is.na(x))) NA_integer_ else sum(x, na.rm = TRUE)
}
df_long <- df %>%
group_by(Year) %>%
summarise_all(funs(sum_NA(.))) %>%
gather(key = "type", value = "expense", -Year)
df_long
#> # A tibble: 8 x 3
#> Year type expense
#> <int> <chr> <dbl>
#> 1 1999 expense_1 42.2
#> 2 2000 expense_1 57.7
#> 3 1999 expense_2 NA
#> 4 2000 expense_2 NA
#> 5 1999 expense_3 NA
#> 6 2000 expense_3 10.6
#> 7 1999 expense_4 31.8
#> 8 2000 expense_4 4.75
ggplot(df_long, aes(x = Year, y = expense, color = type, group = type)) +
geom_point() +
geom_line() +
scale_x_continuous(breaks = scales::pretty_breaks(n = 1)) +
theme_bw()
由reprex package(v0.2.0)创建于2018-05-21。
答案 1 :(得分:0)
您可以让ggplot
为您完成大部分工作 - 只需gather
,然后开始绘图:
df %>%
gather(expense, value, -Year) %>%
ggplot(aes(x=Year, y=value, color=expense)) +
geom_line(stat="summary", fun.y="sum")