我想分析 2020 年与 2021 年的 Covid 情况,并想使用 ggplot 展示 2021 年病毒的传染性
df <- data.frame(
self_impact = as.factor(c("Y", "Y", "Y", "N", "N", "Y", "Y", "Y", "Y", "N")),
impacted_family = c("4", "0", "5", "1", "2", "0", "3", "0", "2", "2"),
month = c(
"Jan-21", "Jan-21", "Feb-21", "Jan-21", "Mar-21", "Mar-21", "Apr-21",
"Oct-20", "Nov-20", "Dec-20"
)
)
self_impact impacted_family month
Y 4 Jan-21
Y 0 Jan-21
Y 5 Feb-21
N 1 Jan-21
N 2 Mar-21
Y 0 Mar-21
Y 3 Apr-21
Y 0 Oct-20
Y 2 Nov-20
N 2 Dec-20
2020 年有 2 个自我影响,而 2021 年有 5 个自我影响。
在 2020 年的这 2 个自我影响中,一个家庭被感染,而在 2021 年,5 个自我影响中有 3 个家庭被感染。
此外,与 2020 年相比,2021 年受影响的家庭成员数量非常高。
我想使用 ggplot 和每年的一些颜色选项在堆积条形图中显示这三个信息。
任何帮助都是有用的,谢谢!
答案 0 :(得分:0)
最好避免使用 3 维,因为人们往往会在 > 2 维上迷失方向。最好为 self_impact 绘制 2 个图表,每个图表一个。
尽管如此,您可以按年份 + self_impact 总结您的数据框,然后使用 facet_wrap 绘制以展示 3 维,如下所示。
FuncThatReturnsPointerToInts()
答案 1 :(得分:0)
一种方法是使用self_impact
这样的颜色的折线图
library(lubridate)
library(tidyverse)
# Graph by month
monthly_summary_data <- df %>%
mutate(month_formatted = as.Date(paste("01 ", month), format = "%d %b-%y")) %>%
# getting to char date->year
# removing since redundant.
group_by(month_formatted, self_impact) %>%
summarise(
impacted_family = sum(as.numeric(impacted_family)),
self_impact2 = n(),
.groups = "drop"
)
# As we can see the data is not very much and the plot at month level is just
# noise
ggplot(data = monthly_summary_data) +
geom_line(aes(x = month_formatted, y = impacted_family,
group = self_impact, color = self_impact))
# Graph by year
year_summary_data <- df %>%
mutate(year =
factor(year(as.Date(paste("01 ", month), format = "%d %b-%y")))) %>%
# getting to char date->year
# removing since redundant.
group_by(year, self_impact) %>%
summarise(
impacted_family = sum(as.numeric(impacted_family)),
self_impact2 = n(),
.groups = "drop"
)
# With the sample amount of data a year level graph is better
ggplot(data = year_summary_data) +
geom_line(aes(x = year, y = impacted_family,
group = self_impact, color = self_impact)) +
# Set y axis to start from ZERO
scale_y_continuous(limits = c(0, NA))
cumsum
数字减少月度图表中的噪音并比较年份与 year
变量的线型year_month_summary <- df %>%
mutate(date = as.Date(paste("01 ", month), format = "%d %b-%y"),
year = factor(year(date)),
month = month(date)) %>%
# getting to char date->year
# removing since redundant.
group_by(year, month, self_impact) %>%
summarise(
impacted_family = sum(as.numeric(impacted_family)),
.groups = "drop") %>%
group_by(year, self_impact) %>%
mutate(cum_impacted_family = cumsum(impacted_family))
# Using the cumsum to reduce the noise by month
# and added the linetype using year variable provide some comparison
ggplot(data = year_month_summary) +
geom_line(aes(x = month, y = cum_impacted_family,
group = paste0(year, self_impact), color = self_impact, linetype = year)) +
# Set y axis to start from ZERO
scale_y_continuous(limits = c(0, NA)) +
scale_x_continuous(breaks = seq(1, 12, by = 1), expand = c(0, 0))
由 reprex package (v2.0.0) 于 2021 年 4 月 20 日创建