我想将我的数据帧转换为适合瀑布图的格式。
我的数据框如下:
employee <- c('A','B','C','D','E','F',
'A','B','C','D','E','F',
'A','B','C','D','E','F',
'A','B','C','D','E','F',)
revenue <- c(10, 20, 30, 40, 10, 40,
8, 10, 20, 50, 20, 10,
2, 5, 70, 30, 10, 50,
40, 8, 30, 40, 10, 40)
date <- as.Date(c('2017-03-01','2017-03-01','2017-03-01',
'2017-03-01','2017-03-01','2017-03-01',
'2017-03-02','2017-03-02','2017-03-02',
'2017-03-02','2017-03-02','2017-03-02',
'2017-03-03','2017-03-03','2017-03-03',
'2017-03-03','2017-03-03','2017-03-03',
'2017-03-04','2017-03-04','2017-03-04',
'2017-03-04','2017-03-04','2017-03-04'))
df<-data.frame(date,employee,revenue)
date employee revenue
1 2017-03-01 A 10
2 2017-03-01 B 20
3 2017-03-01 C 30
4 2017-03-01 D 40
5 2017-03-01 E 10
6 2017-03-01 F 40
7 2017-03-02 A 8
8 2017-03-02 B 10
9 2017-03-02 C 20
10 2017-03-02 D 50
11 2017-03-02 E 20
12 2017-03-02 F 10
13 2017-03-03 A 2
14 2017-03-03 B 5
15 2017-03-03 C 70
16 2017-03-03 D 30
17 2017-03-03 E 10
18 2017-03-03 F 50
19 2017-03-04 A 40
20 2017-03-04 B 8
21 2017-03-04 C 30
22 2017-03-04 D 40
23 2017-03-04 E 10
24 2017-03-04 F 40
如何转换此数据框,以便我可以将其转换为ggplot2中瀑布图的表单。
amount
列与员工总天数不同。
end
列是start
列减去amount
列。
start
列是前一天的Total
结束值。
最终的数据框应如下所示:
date employee start end amount total_for_day
1 2017-03-01 A 0 10 10 10
2 2017-03-01 B 0 20 20 20
3 2017-03-01 C 0 30 30 30
4 2017-03-01 D 0 40 40 40
5 2017-03-01 E 0 10 10 10
6 2017-03-01 F 0 40 40 40
7 2017-03-01 Total 0 150 150 150
8 2017-03-02 A 150 148 -2 8
9 2017-03-02 B 150 140 -10 10
10 2017-03-02 C 150 140 -10 20
11 2017-03-02 D 150 160 10 50
12 2017-03-02 E 150 160 10 20
13 2017-03-02 F 150 120 -30 10
14 2017-03-02 Total 150 118 -32 98
15 2017-03-03 A 118 112 -6 2
16 2017-03-03 B 118 113 -5 5
17 2017-03-03 C 118 168 50 70
18 2017-03-03 D 118 98 -20 30
19 2017-03-03 E 118 108 -10 10
20 2017-03-03 F 118 158 40 50
21 2017-03-03 Total 118 167 49 170
22 2017-03-04 A 167 205 38 40
23 2017-03-04 B 167 170 3 8
24 2017-03-04 C 167 127 -40 30
25 2017-03-04 D 167 177 10 40
26 2017-03-04 E 167 167 0 10
27 2017-03-04 F 167 157 -10 40
28 2017-03-04 Total 167 168 1 168
答案 0 :(得分:3)
有几个步骤可以帮助您实现这一目标,我认为dplyr
包会有所帮助(在下面大量使用)。
我的理解是revenue
给出累计总收入,而不是每日变化。如果这是错误的,您需要撤销其中一些计算。
第一步是创建一个新的data.frame,计算每日总数,然后将其绑定回data.frame。然后,您可以group_by
员工(包括&#34; Total&#34;)并添加将为每个员工单独创建的列(前一天的值,更改,然后是否增加或减少)。
toPlot <-
bind_rows(
df
, df %>%
group_by(date) %>%
summarise(revenue = sum(revenue)) %>%
mutate(employee = "Total")
) %>%
group_by(employee) %>%
mutate(
previousDay = lag(revenue, default = 0)
, change = revenue - previousDay
, direction = ifelse(change > 0
, "Positive"
, "Negative"))
返回:
date employee revenue previousDay change direction
<date> <chr> <dbl> <dbl> <dbl> <chr>
1 2017-03-01 A 10 0 10 Positive
2 2017-03-01 B 20 0 20 Positive
3 2017-03-01 C 30 0 30 Positive
4 2017-03-01 D 40 0 40 Positive
5 2017-03-01 E 10 0 10 Positive
6 2017-03-01 F 40 0 40 Positive
7 2017-03-02 A 8 10 -2 Negative
8 2017-03-02 B 10 20 -10 Negative
9 2017-03-02 C 20 30 -10 Negative
10 2017-03-02 D 50 40 10 Positive
# ... with 18 more rows
然后,我们可以使用:
绘制toPlot %>%
ggplot(aes(xmin = date - 0.5
, xmax = date + 0.5
, ymin = previousDay
, ymax = revenue
, fill = direction)) +
geom_rect(col = "black"
, show.legend = FALSE) +
facet_wrap(~employee
, scale = "free_y") +
scale_fill_brewer(palette = "Set1")
给予
请注意,包括&#34; Total&#34;抛出刻度(需要自由刻度),所以我宁愿省略它:
toPlot %>%
filter(employee != "Total") %>%
ggplot(aes(xmin = date - 0.5
, xmax = date + 0.5
, ymin = previousDay
, ymax = revenue
, fill = direction)) +
geom_rect(col = "black"
, show.legend = FALSE) +
facet_wrap(~employee) +
scale_fill_brewer(palette = "Set1")
为此允许员工之间的直接比较
这是总的
toPlot %>%
filter(employee == "Total") %>%
ggplot(aes(xmin = date - 0.5
, xmax = date + 0.5
, ymin = previousDay
, ymax = revenue
, fill = direction)) +
geom_rect(col = "black"
, show.legend = FALSE) +
scale_fill_brewer(palette = "Set1")
虽然我仍然觉得线图更容易理解(特别是比较员工):
toPlot %>%
filter(employee != "Total") %>%
ggplot(aes(x = date
, y = revenue
, col = employee)) +
geom_line() +
scale_fill_brewer(palette = "Dark2")
如果您想在白天绘制更改,您可以执行以下操作:
toPlot %>%
filter(employee != "Total") %>%
ggplot(aes(x = date
, y = change
, fill = employee)) +
geom_col(position = "dodge") +
scale_fill_brewer(palette = "Dark2")
得到:
但现在你离'&#34;瀑布&#34;情节输出。如果你真的,真的想让瀑布可以与你的情节形成鲜明对比,但它会变得相当丑陋(我强烈强烈推荐上面的线条图)。
在这里,您需要手动移动框,如果您更改输出宽高比(或大小)或员工人数,则需要进行一些修改。您还需要为员工和变更方向添加颜色,这些颜色开始变得粗糙。这属于&#34;可以,但可能不应该&#34;#34; - 可能有更好的方式来显示这些数据。
toPlot %>%
filter(employee != "Total") %>%
ungroup() %>%
mutate(empNumber = as.numeric(as.factor(employee))) %>%
ggplot(aes(xmin = (empNumber) - 0.4
, xmax = (empNumber) + 0.4
, ymin = previousDay
, ymax = revenue
, col = direction
, fill = employee)) +
geom_rect(size = 1.5) +
facet_grid(~date) +
scale_fill_brewer(palette = "Dark2") +
theme(axis.text.x = element_blank()
, axis.ticks.x = element_blank())
给出