我有一个数据集,其中不同的城市进出一个程序,如这个示例数据集:
example.dat <- data.frame (c(1000, 2000, 3000), c("15-10-01", "16-05-01", "16-07-01"), c("16-06-01", "16-10-01", "17-08-01"))
colnames(example.dat) <- c("Population", "Enter.Program", "Leave.Program")
这为您提供了一个如下所示的数据框:
Population Enter.Program Leave.Program
1000 15-10-01 16-06-01
2000 16-05-01 16-10-01
3000 16-07-01 17-08-01
首先,我想创建一个这样的输出表:
Per.Begin Per.End Total.Pop.In
15-10-01 16-04-30 1000
16-05-01 16-05-30 3000
16-06-01 16-06-30 2000
16-07-01 16-09-30 5000
16-10-01 17-07-30 3000
17-08-01 18-04-26 0
然后在ggplot中将其绘制成一个图形,看起来像一个阶梯函数或类似锯齿状矩形表面,其中上边缘是运行总数,有点像累积密度函数但y轴可以向下和向上,以及x轴以时间宽度为步长的步骤。
以下是我阻止的步骤,但我不知道如何执行:
答案 0 :(得分:1)
使用dplyr
(因为您用它标记了问题),您可以按照自己的意愿行事。需要做的主要事情是:
代码低于
library(dplyr)
library(ggplot2)
example.dat <- data.frame (c(1000, 2000, 3000), c("15-10-01", "16-05-01", "16-07-01"), c("16-06-01", "16-10-01", "17-08-01"))
colnames(example.dat) <- c("Population", "Enter.Program", "Leave.Program")
changes = example.dat %>%
select("Population","Date"="Enter.Program") %>%
bind_rows(example.dat %>%
select("Population","Date"="Leave.Program") %>%
mutate(Population = -1*Population)) %>%
mutate(Date = as.Date(Date,"%y-%m-%d"))
startDate = min(changes$Date)
endDate = max(changes$Date)
final = data_frame(Date = seq(startDate,endDate,1)) %>%
left_join(changes,by="Date") %>%
mutate(Population = cumsum(ifelse(is.na(Population),0,Population)))
ggplot(data = final,aes(x=Date,y=Population)) +
geom_line()
<强>更新强>
如果您不想拥有从最早到最晚的每个日期,可以使用 blurgh for
循环添加所需的行以获得漂亮的结果。在这里,我们遍历并复制第一个之后的每个日期与前面的累积总和。它并不漂亮,但却是图表。
library(dplyr)
library(ggplot2)
example.dat <- data.frame (c(1000, 2000, 3000), c("15-10-01", "16-05-01", "16-07-01"), c("16-06-01", "16-10-01", "17-08-01"))
colnames(example.dat) <- c("Population", "Enter.Program", "Leave.Program")
changes = example.dat %>%
select("Population","Date"="Enter.Program") %>%
bind_rows(example.dat %>%
select("Population","Date"="Leave.Program") %>%
mutate(Population = -1*Population)) %>%
mutate(Date = as.Date(Date,"%y-%m-%d")) %>%
arrange(Date) %>%
mutate(Population = cumsum(Population))
for(i in nrow(changes):2){
changes = bind_rows(changes[1:(i-1),],
data_frame(Population = changes$Population[i-1],Date = changes$Date[i]),
changes[i:nrow(changes),])
}
ggplot(data = changes,aes(x=Date,y=Population)) +
geom_line()