这是我正在制作的一个小程序,最终获得最终图表。我有2个单独的数据集。一个称为T0,第二个包含我拥有的所有数据。我希望这个程序从第一个数据框中获取T0值,然后搜索3年前和T0年后3年的最高价格。
本质上,我的程序将分配我任意选择的T0值。然后它将在我的数据库中自动搜索除t0年以外每年的最高价格。
我面临的问题是在计划中实施T0值。当我运行我的代码时,它只是没有出现。
问题显然与我定义T0的方式有关。我应该使用for循环吗?或者是否有一个我想念的小调整?
最终结果需要:
数据库示例:
T0data:
"Музыка"
所有数据:
structure(list(Company = structure(1:3, .Label = c("Amazon",
"Cisco", "McDonald's"), class = "factor"), Year = c(2011L, 2008L,
2013L), Price = c(182, 21.82, 95.15)), .Names = c("Company",
"Year", "Price"), row.names = c(NA, 3L), class = "data.frame")
我的代码:
structure(list(Company = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("Amazon", "Cisco", "McDonald's"), class = "factor"),
Year = c(2008L, 2008L, 2008L, 2008L, 2009L, 2009L, 2010L,
2010L, 2010L, 2011L, 2011L, 2012L, 2012L, 2013L, 2013L, 2014L,
2014L, 2014L, 2008L, 2010L, 2010L, 2010L, 2011L, 2011L, 2012L,
2012L, 2013L, 2013L, 2014L, 2014L, 2014L, 2015L, 2015L, 2016L,
2016L, 2016L, 2005L, 2005L, 2005L, 2006L, 2006L, 2007L, 2007L,
2007L, 2008L, 2008L, 2009L, 2009L, 2009L, 2010L, 2010L, 2011L,
2011L, 2011L), Price = c(91L, 77L, 81L, 87L, 63L, 88L, 110L,
75L, 117L, 170L, 190L, 215L, 245L, 316L, 275L, 330L, 378L,
390L, 55L, 62L, 66L, 65L, 72L, 98L, 93L, 88L, 99L, 101L,
94L, 103L, 96L, 99L, 116L, 112L, 123L, 113L, 19L, 17L, 18L,
20L, 19L, 26L, 31L, 27L, 24L, 21L, 14L, 22L, 18L, 26L, 22L,
14L, 16L, 15L)), .Names = c("Company", "Year", "Price"), class = "data.frame", row.names = c(NA,
-54L))
答案 0 :(得分:0)
这是一个使用来自tidyverse而不是dplyr
的{{1}} / tidyr
个软件包的解决方案,但它应该可以完成这项工作:
data.table
结果情节:
library(dplyr); library(tidyr)
T0.modified <- T0data %>%
# create year range based on each company's T0 year
mutate(Year.M1 = Year - 1,
Year.M2 = Year - 2,
Year.M3 = Year - 3,
Year.P1 = Year + 1,
Year.P2 = Year + 2,
Year.P3 = Year + 3) %>%
# convert to long format, match with Alldata based on both company & year
gather(reference.year, actual.year, -Company, -Price) %>%
left_join(Alldata, by = c("Company" = "Company", "actual.year" = "Year")) %>%
# keep T0 price for year T0, & use matched prices for all other years
mutate(Price = ifelse(reference.year == "Year", Price.x, Price.y)) %>%
# take maximum of all matched prices for each company each year
group_by(Company, reference.year) %>%
summarise(Price = max(Price)) %>%
ungroup() %>%
# order reference.year for correct sequence in ggplot's x-axis
mutate(reference.year = factor(reference.year,
levels = c("Year.M3", "Year.M2", "Year.M1", "Year",
"Year.P1", "Year.P2", "Year.P3"),
labels = c("T-3", "T-2", "T-1", "T0", "T+1", "T+2", "T+3")))
修改使用library(ggplot2)
ggplot(T0.modified,
aes(x = reference.year, y = Price, group = Company, color = Company)) +
geom_line(aes()) +
xlab("Year") + theme_bw()
添加每年的平均值:
stat_summary