在ggplot2中使用美学映射可视化多变量数据

时间:2018-10-16 19:08:23

标签: r ggplot2

我正在尝试使用ggplot2和多变量数据来绘制geom_point图,并且遇到了对数据进行颜色编码并对其进行可视化绘制的问题。我在下面分享了我的数据:我对工作量(X轴)与换发(y轴)感兴趣,并按头发类型(脱发的类型:弥散,额叶/颞叶和/或顶点)对数据进行颜色编码。该调查的本质是多变量的,患者能够认可多种脱发类型(头发类型1,2和/或3)。前20名参与者的代码如下:

Figure3Data = structure(list(MonthsMassage = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1), 
MinutesPerDayMassage = c("0-10 minutes daily", "0-10 minutes daily", 
"0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", 
"0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", 
"0-10 minutes daily", "0-10 minutes daily", 
"11-20 minutes daily", "11-20 minutes daily", "11-20 minutes daily", 
"0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", 
"0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", 
"0-10 minutes daily"), Minutes = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 
5, 15, 15, 15, 5, 5, 5, 5, 5, 5, 5), hairchange = c(-1, -1, 0, 
-1, 0, -1, -1, 0, 0, -1, 0, -1, -1, 0, 0, -1, 0, -1, 0, -1), 
HairType1 = c("Templefrontal", "Templefrontal", "Templefrontal", 
"Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", 
"other", "Templefrontal", "Templefrontal", "Templefrontal", 
"Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", 
"Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", 
"Templefrontal"), HairType2 = c("other", "other", "other", 
"other", "other", "other", "other", "other", "other", "Vertexthinning", 
"Vertexthinning", "other", "Vertexthinning", "other", "other", 
"Vertexthinning", "other", "Vertexthinning", "Vertexthinning", 
"other"), HairType3 = c("other", "Diffusethinning", "other", 
"Diffusethinning", "other", "other", "Diffusethinning", "Diffusethinning", 
"Diffusethinning", "other", "Diffusethinning", "Diffusethinning", 
"other", "other", "Diffusethinning", "Diffusethinning", "other", 
"Diffusethinning", "Diffusethinning", "Diffusethinning"), 
Effort = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.5, 2.5, 
2.5, 2.5, 2.5, 2.5, 2.5), EffortGroup = c("<5", "<5", "<5", 
"<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", 
"<5", "<5", "<5", "<5", "<5", "<5", "<5")), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

由于患者认可属于多列的发型,因此我无法使用以下代码直观地分离数据:

ggplot(data, aes(x=Effort, y=hairchange, color  = hairtype????)+geom_point()

如果数据以某种方式出现在显示脱发的1列中,则很容易看到:

geom_point 1 column

因此,我想知道是否存在一种组织数据的方式,以便对三种脱发类型进行可视化和颜色编码?我尝试过reshape2并融化了,没有任何运气。我想避免创建“报告的多种类型”的第四类,因为这使许多人无法理解我想获得的见识。

或者,将非常感谢提供用于对此数据进行绘图的其他方法(密度/线图)的建议。我的一个想法是要制作四个单独的线图-每个脱发类型(即平均,散布,顶点,时间)一个-以x轴为“努力”,以y轴为平均感知到的头发变化。 / p>

2 个答案:

答案 0 :(得分:0)

我使用以下代码段:

Employee

您可以创建一个将三种头发类型结合在一起的全新列,只需将第五,第六和第七列粘贴为新的“ combinedHair”列即可:

var WorkingTimePerDatePerEmployee = myDbContext.Attendencies
    // make groups of attendencies for the same Employee
    .GroupBy(attendancy => employee.Id,

    // the attendancies in the group are all for the same Employee
    (employeeId, attendanciesForThisEmployeeId) => new
    {
        EmployeeId = employeeId,

        AttendanciesGroupedByDate = attendanciesForThisEmployeeId
            // group by same Date:
            .GroupBy(attendancy2 => attendancy2.Date,
            (date, sameDateAttendancies => new
            {
               Date = date
               TotalWorkingHoursOnDate = sameDateAttendancies
                  // per attendancy select CheckOut - CheckIn = working time per shift
                  .Select(sameDateAttendancy => sameDateAttendancy.Checkout - sameDateAttendancy.Checkin)
                  // sum all shifts on this date
                  .Sum(),
            }),
    });

如果您想绘制该数据表的数据,则说明它具有过度绘图的功能,因此建议使用library(ggplot2) library(data.table) dt <- data.table(MonthsMassage = c(0, 0, 0, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1), MinutesPerDayMassage = c("0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", "11-20 minutes daily", "11-20 minutes daily", "11-20 minutes daily", "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily", "0-10 minutes daily"), Minutes = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 15, 15, 15, 5, 5, 5, 5, 5, 5, 5), hairchange = c(-1, -1, 0, -1, 0, -1, -1, 0, 0, -1, 0, -1, -1, 0, 0, -1, 0, -1, 0, -1), HairType1 = c("Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", "other", "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal", "Templefrontal"), HairType2 = c("other", "other", "other", "other", "other", "other", "other", "other", "other", "Vertexthinning", "Vertexthinning", "other", "Vertexthinning", "other", "other", "Vertexthinning", "other", "Vertexthinning", "Vertexthinning", "other"), HairType3 = c("other", "Diffusethinning", "other", "Diffusethinning", "other", "other", "Diffusethinning", "Diffusethinning", "Diffusethinning", "other", "Diffusethinning", "Diffusethinning", "other", "other", "Diffusethinning", "Diffusethinning", "other", "Diffusethinning", "Diffusethinning", "Diffusethinning"), Effort = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5), EffortGroup = c("<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5", "<5")) 函数:

dt[, CombinedHair:=do.call(paste0,.SD), .SDcols=c(5,6,7)]

如果您想要更好的类名,则可以使用空引号替换“默认”。

Plot generated with the code above

答案 1 :(得分:0)

这是一种将位置移动到其自己的变量中的方法(此处未显示,但是您可以将其映射到构面,点形或其他美感(如果需要),然后根据头发类型绘制颜色,删除“其他”发型。

library(tidyverse)
Figure3Data_long <- Figure3Data %>%
  gather(location, hairtype, HairType1:HairType3) %>%
  filter(hairtype != "other")

ggplot(Figure3Data_long,
       aes(Effort, hairchange, color = hairtype)) +
  # geom_point() +  
  geom_jitter(width = 0.03, height = 0.01)  # illustrative to show overplots 

enter image description here