更好地可视化复杂数据的方法

时间:2014-01-03 16:45:02

标签: r ggplot2 data-visualization

我正在使用以下数据使用ggplot2在R中创建一个图。

 Hour.of.day     Model  N Distance.travelled       sd        se        ci
1         0100 h300_fv30 60          3.6264709 5.078277 0.6556027 1.3118579
2         0100 h300_fv35 60          2.9746019 5.313252 0.6859379 1.3725586
3         0100 h300_fv40 60          3.0422525 3.950650 0.5100267 1.0205610
4         0200 h300_fv30 60          4.3323896 6.866003 0.8863972 1.7736767
5         0200 h300_fv35 60          3.5567420 6.259378 0.8080823 1.6169689
6         0200 h300_fv40 60          2.5232512 4.533234 0.5852380 1.1710585
7         0300 h300_fv30 60          3.1800537 5.303506 0.6846797 1.3700409
8         0300 h300_fv35 60          2.9281442 4.445953 0.5739700 1.1485113
9         0300 h300_fv40 60          2.5078045 4.058295 0.5239236 1.0483687
10        0400 h300_fv30 60          3.3408231 4.567161 0.5896180 1.1798229
11        0400 h300_fv35 60          2.8679676 5.396700 0.6967110 1.3941155
12        0400 h300_fv40 60          3.1615813 4.244155 0.5479180 1.0963815
13        0500 h300_fv30 60          3.8117851 6.970900 0.8999394 1.8007745
14        0500 h300_fv35 60          2.1130581 3.925906 0.5068323 1.0141691
15        0500 h300_fv40 60          3.6430531 4.905484 0.6332953 1.2672209
16        0600 h300_fv30 60          3.5234762 5.150027 0.6648657 1.3303931
17        0600 h300_fv35 60          2.0341804 3.192176 0.4121082 0.8246266
18        0600 h300_fv40 60          3.2838958 3.770624 0.4867855 0.9740555
19        0700 h300_fv30 60          3.8327926 6.521022 0.8418603 1.6845587
20        0700 h300_fv35 60          1.6933289 2.607322 0.3366039 0.6735428
21        0700 h300_fv40 60          2.3896956 3.435656 0.4435413 0.8875241
22        0800 h300_fv30 60          3.3077466 6.504371 0.8397107 1.6802573
23        0800 h300_fv35 60          1.4823307 3.556884 0.4591917 0.9188405
24        0800 h300_fv40 60          2.4161741 3.571444 0.4610715 0.9226019
25        0900 h300_fv30 60          2.1506438 2.893029 0.3734885 0.7473487
26        0900 h300_fv35 60          1.8821961 3.457929 0.4464167 0.8932778
27        0900 h300_fv40 60          1.7896335 2.714514 0.3504423 0.7012334
28        1000 h300_fv30 60          2.5107475 5.491835 0.7089929 1.4186914
29        1000 h300_fv35 60          0.9491365 2.061712 0.2661658 0.5325966
30        1000 h300_fv40 60          1.6678013 3.234033 0.4175119 0.8354393
31        1100 h300_fv30 60          1.8602186 3.365695 0.4345093 0.8694511
32        1100 h300_fv35 60          1.4385708 2.869765 0.3704851 0.7413389
33        1100 h300_fv40 60          1.1273899 2.010280 0.2595261 0.5193105
34        1200 h300_fv30 60          1.4870763 2.112841 0.2727667 0.5458048
35        1200 h300_fv35 60          2.5295481 4.740384 0.6119810 1.2245711
36        1200 h300_fv40 60          1.6551202 3.051420 0.3939366 0.7882653
37        1300 h300_fv30 60          2.8791490 4.925870 0.6359271 1.2724872
38        1300 h300_fv35 60          2.4731563 5.266690 0.6799268 1.3605303
39        1300 h300_fv40 60          4.5989133 8.394460 1.0837201 2.1685189
40        1400 h300_fv30 60          1.5050205 3.188480 0.4116310 0.8236717
41        1400 h300_fv35 60          1.7615688 3.064842 0.3956693 0.7917325
42        1400 h300_fv40 60          2.2766514 5.215937 0.6733746 1.3474194
43        1500 h300_fv30 60          1.9097882 2.770040 0.3576106 0.7155772
44        1500 h300_fv35 60          2.0109347 4.070014 0.5254365 1.0513961
45        1500 h300_fv40 60          1.6316881 4.119681 0.5318485 1.0642264
46        1600 h300_fv30 60          3.3246263 5.352698 0.6910304 1.3827486
47        1600 h300_fv35 60          2.0389703 3.781869 0.4882372 0.9769604
48        1600 h300_fv40 60          1.0204568 2.205685 0.2847527 0.5697888
49        1700 h300_fv30 60          3.6132519 5.467875 0.7058996 1.4125019
50        1700 h300_fv35 60          2.1139255 4.178283 0.5394140 1.0793648
51        1700 h300_fv40 60          1.5547818 3.411135 0.4403756 0.8811895
52        1800 h300_fv30 60          5.0552532 7.344069 0.9481152 1.8971742
53        1800 h300_fv35 60          2.1832792 3.824244 0.4937078 0.9879070
54        1800 h300_fv40 60          1.6532516 3.273697 0.4226325 0.8456856
55        1900 h300_fv30 60          5.6107731 6.891023 0.8896272 1.7801399
56        1900 h300_fv35 60          2.9822004 5.958244 0.7692060 1.5391777
57        1900 h300_fv40 60          2.7111394 3.798765 0.4904184 0.9813250
58        2000 h300_fv30 60          6.0438385 7.126952 0.9200855 1.8410868
59        2000 h300_fv35 60          3.9517888 6.462761 0.8343388 1.6695081
60        2000 h300_fv40 60          3.9508503 5.374253 0.6938130 1.3883167
61        2100 h300_fv30 60          4.2144712 5.648673 0.7292406 1.4592070
62        2100 h300_fv35 60          2.2205186 3.397391 0.4386013 0.8776392
63        2100 h300_fv40 60          3.9000010 5.881409 0.7592866 1.5193290
64        2200 h300_fv30 60          3.9478958 5.584154 0.7209112 1.4425401
65        2200 h300_fv35 60          3.1612149 4.788883 0.6182421 1.2370996
66        2200 h300_fv40 60          3.7812992 6.424478 0.8293965 1.6596186
67        2300 h300_fv30 61          3.3860628 5.176299 0.6627571 1.3257117
68        2300 h300_fv35 61          3.7427743 6.257596 0.8012031 1.6026448
69        2300 h300_fv40 61          3.6674335 4.945831 0.6332487 1.2666861
70        2400 h300_fv30 59          3.8745470 5.763821 0.7503856 1.5020600
71        2400 h300_fv35 59          3.1284346 5.016476 0.6530895 1.3073007
72        2400 h300_fv40 59          3.7563017 4.819053 0.6273872 1.2558520

情节函数是

ggplot(my_data, aes(x=Hour.of.day, y=Distance.travelled, colour=Model)) + 
    geom_errorbar(aes(ymin = Distance.travelled - ci, ymax = Distance.travelled + ci), width=.1, position=position_dodge(2)) + 
    geom_line(position=position_dodge(2)) + 
    geom_point(position=position_dodge(2)) + 
    scale_x_discrete(breaks=c("0600", "1200", "1800", "2400")) + 
    theme(axis.ticks = element_blank())

在结果图中很难区分三种不同的模式。 enter image description here

是否有人对如何改进可视化提出任何建议,以便更好地区分三种不同的模式?例如,某种方式强调平均点并将置信区间置于背景中?

2 个答案:

答案 0 :(得分:18)

使用线条和色带:

library(ggplot2)
ggplot(my_data, aes(x=Hour.of.day, y=Distance.travelled,
                     fill=Model)) +
    theme_bw()+
    geom_line(aes(colour=Model))+
    geom_ribbon(aes(ymin = Distance.travelled - ci,
                    ymax = Distance.travelled + ci),alpha=0.4)+
    scale_x_discrete(breaks=c("0600", "1200", "1800", "2400")) + 
    theme(axis.ticks = element_blank())
ggsave("ribbonplot.png",width=7,height=4)

enter image description here

如果你想更强烈地强调平均模式,你可以使线条更宽(lwd)或更柔和(alpha)。

答案 1 :(得分:10)

这是另一种方式,使用方面:

ggplot(gg,aes(x=Hour.of.day, y=Distance.travelled)) +
    geom_pointrange(aes(ymin=Distance.travelled-ci,ymax=Distance.travelled+ci,color=Model))+
    facet_grid(Model~.) + 
    stat_smooth(formula=y~1, method="lm",linetype=2,se=F)+
    geom_abline(aes(slope=0,intercept=mean(Distance.travelled)),linetype=3)

这里的主要思想是数据应该有一个参考框架(这里是给定模型的行进距离的平均值)。这可以一目了然地告诉您行进的距离与平均值的显着差异。灰色虚线是所有模型的平均值,它告诉您给定模型是否倾向于或多或少地随时间推移所有模型的平均值。

如果你在stat_smooth(...)的调用中设置se = T,你也会得到平均值的可变性,但我认为所有的阴影都会从主要观点中减去。