从R

时间:2018-12-17 19:02:57

标签: r

我有以下数据

ID | Category (1-5) | Task1(in min) | Task2(in min) | Task3(in min)

我想创建一个直方图,在x轴上具有不同的类别,在y轴上具有任务1、2、3(相应着色)的累积持续时间。

在R中无需更改原始数据是否有可能?似乎ggplot只占用一列,而不占用多列。

编辑: 我(相当差劲)的尝试是

library(ggplot2)
ggplot(dataset) + geom_col(aes(x=Category, y=Task1, fill=Task2))

我无法全力以赴地填充多列。

这是样本数据的输出

dataset <- structure(list(ID = c(6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25), Category = c("5 - Expert", "2 - Novice", "3 - Intermediate", "5 - Expert", "2 - Novice", "3 - Intermediate", "3 - Intermediate", "3 - Intermediate", "2 - Novice", "3 - Intermediate", "2 - Novice", "4 - Advanced", "2 - Novice", "3 - Intermediate", "2 - Novice", "5 - Expert", "4 - Advanced", "2 - Novice", "2 - Novice", "3 - Intermediate"), Task1 = structure(c(300, 360, 240, 180, 180, 240, 240, 360, 300, 300, 180, 360, 240, 240, 240, 300, 240, 240, 240, 240), class = c("hms", "difftime"), units = "secs"), Task2 = structure(c(480, 360, 660, 420, 660, 240, 660, 540, 780, 360, 540, 720, 360, 480, 540, 300, 420, 600, 240, 660), class = c("hms", "difftime"), units = "secs"), Task3 = structure(c(360, 480, 240, 300, 240, 240, 240, 240, 240, 180, 240, 180, 120, 120, 240, 240, 240, 240, 300, 240), class = c("hms", "difftime"), units = "secs")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))

3 个答案:

答案 0 :(得分:1)

您非常亲密。使数据长。这里是使用ggplot的解决方案。

library(tidyverse)
dataset <- structure(list(ID = c(6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25), Category = c("5 - Expert", "2 - Novice", "3 - Intermediate", "5 - Expert", "2 - Novice", "3 - Intermediate", "3 - Intermediate", "3 - Intermediate", "2 - Novice", "3 - Intermediate", "2 - Novice", "4 - Advanced", "2 - Novice", "3 - Intermediate", "2 - Novice", "5 - Expert", "4 - Advanced", "2 - Novice", "2 - Novice", "3 - Intermediate"), Task1 = structure(c(300, 360, 240, 180, 180, 240, 240, 360, 300, 300, 180, 360, 240, 240, 240, 300, 240, 240, 240, 240), class = c("hms", "difftime"), units = "secs"), Task2 = structure(c(480, 360, 660, 420, 660, 240, 660, 540, 780, 360, 540, 720, 360, 480, 540, 300, 420, 600, 240, 660), class = c("hms", "difftime"), units = "secs"), Task3 = structure(c(360, 480, 240, 300, 240, 240, 240, 240, 240, 180, 240, 180, 120, 120, 240, 240, 240, 240, 300, 240), class = c("hms", "difftime"), units = "secs")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))

dataset_long <- dataset %>% gather(task, value, Task1:Task3)

ggplot(dataset_long) + geom_col(aes(x = Category, y = value, fill = task))

reprex package(v0.2.1)于2018-12-18创建

我希望这接近您想要的输出。它不需要更改原始数据,但是使用R需要一些灵活性来塑造数据。我猜想将数据整理为正确的形状/形状大约是R中分析/可视化任务所需工作的95%。

答案 1 :(得分:0)

df %>% mutate(task_composite = Task1 + Task3 + Task 4) %>% ggplot(aes(task_composite) + 
geom_histogram()

答案 2 :(得分:0)

我认为您不需要直方图。直方图是频率分布,在y轴上具有计数,在x轴上具有一些连续变量。因此,您实际上只在绘制一个变量。

要获得x轴上的类别和y轴上的累积时间,您想使用geom_bar()。由于每个类别在x轴上都是它自己的条形,因此不需要分别给它们上色,但是我使用fill=Category的{​​{1}}包装器中的aes()自变量功能只是为了说明。

示例数据框:

ggplot()

示例解决方案:

df <- data.frame(Category = c("Cat1", "Cat2", "Cat3", "Cat4", "Cat5"),
                 Task1 = rnorm(5,7,0.5),
                 Task2 = rnorm(5,8,0.5),
                 Task3 = rnorm(5,9,0.5))