如何在R / x上设置带有类型数据的x,y图,其中每个点还显示标准偏差?

时间:2019-01-29 15:16:45

标签: r plot settings boxplot

我需要代表一系列社交网络指标之前/之后的变化。这个想法是,每个点都由x坐标和y坐标组成,其中x坐标是一种社会角色的平均值,而线代表标准偏差。

例如:在“之前”的那一刻,我们有4个“公共机构”类型的社会参与者,而在“之后”的那一刻,我们有6个参与者(有些是相同的,而另一些是新的,但这并不重要,因为我试图从结构而不是从节点进行描述。从该样本中得出平均值和偏差,而我希望与该图进行比较的是那些在不同度量标准中“增加”或“减少”的人。

当前,我的数据库看起来像这样(建议更改它,但我认为可以用这种方式进行处理)。

    time category    code     Clossenness
    1         PI     PI1          0,658
    1         PI     PI2          0,568
    1         PI     PI3          0,581
    1         PI     PI4          0,595
    1         PI     PI5          0,556
    1         PrI    PrI1         0,658
    1         PrI    PrI2         0,543
    1         NGO's  NGO1         0,568
    1         NGO's  NGO2         0,581
    2         PI     PI1          0,611
    2         PI     PI6          0,600
    2         PI     PI7          0,485
    2         PI     PI8          0,569
    2         PI     PI9          0,579
    2         PI     PI10         0,635
    2         PI     PI11         0,623
    2         PI     PI12         0,623
    2         PI     PI13         0,673
    2         PrI    PrI1         0,673
    2         PrI    PrI3         0,600
    2         NGO's  NGO1         0,750
    2         NGO's  NGO3         0,508
    2         NGO's  NGO4         0,524

structure(list(structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"2"), class = "factor"), timecategory = structure(c(2L, 2L, 2L, 
2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
3L, 1L, 1L, 1L), .Label = c("NGO's", "PI", "PrI"), class = "factor"), 
    code = structure(c(5L, 10L, 11L, 12L, 13L, 18L, 19L, 1L, 
    2L, 5L, 14L, 15L, 16L, 17L, 6L, 7L, 8L, 9L, 18L, 20L, 1L, 
    3L, 4L), .Label = c("NGO1", "NGO2", "NGO3", "NGO4", "PI1", 
    "PI10", "PI11", "PI12", "PI13", "PI2", "PI3", "PI4", "PI5", 
    "PI6", "PI7", "PI8", "PI9", "PrI1", "PrI2", "PrI3"), class = "factor"), 
    Clossenness = structure(c(15L, 6L, 9L, 10L, 5L, 15L, 4L, 
    6L, 9L, 12L, 11L, 1L, 7L, 8L, 14L, 13L, 13L, 16L, 16L, 11L, 
    17L, 2L, 3L), .Label = c("0,485", "0,508", "0,524", "0,543", 
    "0,556", "0,568", "0,569", "0,579", "0,581", "0,595", "0,600", 
    "0,611", "0,623", "0,635", "0,658", "0,673", "0,750"), class = "factor")), .Names = c("", 
"time category", "code", "Clossenness"), row.names = c(NA, -23L
), class = "data.frame")

箱形图以描述性的方式表示我需要的信息,但是比较之前/之后的更改变得更加困难,因为您必须成对查看箱形图。然后,我发现使用我建议的其他图形更为合适。困难在于我不知道用相同的信息制作该图的直接方法。

预期结果 https://ibb.co/WsrDN7D 实际结果 https://ibb.co/M6QWXLv

1 个答案:

答案 0 :(得分:1)

使用函数group_by()summarise(),可以每次计算每个类别的平均值,而使用函数spread(),可以将这两个值重新组合在同一行上:

set.seed(1)

df <- data.frame(
  time        = rep(c('before', 'after'), each = 8), 
  category    = rep(rep(c('PI', 'NGO'), each = 4), times = 2),
  clossenness = rnorm(16, .6, .1) 
) %>% 

  group_by(time, category) %>% 
  summarise(mean_clos = mean(clossenness)) %>% 

  spread(key = time, value = mean_clos)

  category after before
  <fct>    <dbl>  <dbl>
1 NGO      0.630  0.595
2 PI       0.573  0.659

然后,您可以使用函数geom_label()geom_point()绘制该点(之前,之后),并将其与身份线进行比较,以查看它是增加还是减少。

df %>% 
  ggplot(aes(x = before, y = after)) +
  #geom_point() +
  geom_label(aes(label = category)) +
  geom_abline(intercept = 0, slope = 1) +
  xlim(c(.5, .7)) + ylim(c(.5, .7))

enter image description here

在此示例中,NGO增加,而PI减少。