用3个变量改善情节

时间:2014-03-12 10:21:34

标签: r plot ggplot2

我有以下data.frame:

> dd
           V1        V2           V3
1   14.743730 1.5762030 1.820564e+05
2   11.293525 1.5616743 1.849190e+07
3    9.937889 4.2807281 5.226222e+07
4   15.483217 0.6055921 1.612945e+05
5   11.512925 0.8590718 1.653430e+07
6    9.271709 3.5639570 1.648311e+08
7   12.154779 1.0913056 7.725100e+06
8   12.254863 2.2639289 5.767500e+06
9   10.868568 1.4670616 2.142830e+07
10  12.384219 0.8867792 2.831100e+06
11  13.742940 0.3268744 1.516208e+06
12  12.315132 1.2894085 4.788700e+06
13  14.989849 0.5521075 1.768097e+05
14  11.451050 1.1676040 1.751310e+07
15  15.363073 0.6223934 1.657917e+05
16  12.899220 0.4755159 1.967226e+06
17  12.464293 0.9886397 2.086363e+06
18  12.736701 0.4495683 2.018285e+06
19   8.616858 4.5335367 2.774000e+08
20  10.950807 1.6357879 2.142830e+07
21  11.005428 2.6383457 2.044950e+07
22   9.629051 2.8459297 1.648311e+08
23  12.043554 1.6499405 9.682700e+06
24  14.914123 0.5430869 1.785336e+05
25  16.979896 0.3030517 2.360639e+04
26  13.815511 1.0962220 1.456639e+06
27  15.017750 0.4717264 1.760602e+05
28  11.849398 0.9813975 1.261910e+07
29  10.454495 3.5180136 2.338590e+07
30   9.011889 3.1449919 1.648311e+08
31   9.553930 3.5578561 1.648311e+08
32  11.608236 1.3658448 1.555550e+07
33  13.369223 1.0920776 1.762991e+06
34  11.515771 1.4969232 1.653430e+07
35   8.764053 3.9874923 2.774000e+08
36  10.122623 1.7772289 5.226222e+07
37  14.230083 1.0955896 1.022641e+06
38  10.098232 2.3853124 5.226222e+07
39  10.714418 2.3483052 2.240710e+07
40   8.969804 4.1778522 1.648311e+08
41  17.924744 0.9372727 1.354203e+04
42   7.811163 8.3438712 2.774000e+08
43  18.910904 0.6453018 6.860896e+03
44  10.839581 1.7566555 2.142830e+07
45  10.839581 1.6449275 2.142830e+07
46  13.870945 0.5644657 1.414090e+06
47  11.440355 0.8434520 1.751310e+07
48  13.923468 0.8897043 1.363032e+06
49  11.617285 1.0667866 1.555550e+07
50  11.502875 0.5134841 1.653430e+07
51  18.078190 0.3824371 1.288279e+04
52  13.304685 0.6976290 1.797030e+06
53   9.629051 4.0785583 1.648311e+08
54  17.460501 0.7800599 1.501846e+04
55  12.623137 2.2468834 2.052324e+06
56  10.982212 2.7085846 2.044950e+07
57  10.540937 3.5114572 2.240710e+07
58  13.892472 0.8788488 1.388561e+06
59  11.679287 1.4905993 1.457670e+07
60  13.785051 0.8933495 1.482169e+06
61   8.006368 6.2710499 2.774000e+08
62   9.210340 2.5349723 1.648311e+08
63  13.122363 0.6069901 1.882128e+06
64  17.359364 0.6707361 1.525865e+04
65  18.195729 0.3666130 1.230514e+04
66  11.751942 1.2659074 1.457670e+07
67  10.477288 1.5443280 2.338590e+07
68  11.517913 0.8443011 1.653430e+07
69  11.476261 2.2252419 1.751310e+07
70   9.705037 3.5185753 1.648311e+08
71  12.647548 1.3738172 2.043814e+06
72  11.231888 2.0682796 1.947070e+07
73  10.889304 3.7001075 2.142830e+07
74  12.283497 2.2255645 5.767500e+06
75  10.933107 1.2043548 2.142830e+07
76  11.881727 1.0832527 1.261910e+07
77  11.191342 1.8457868 1.947070e+07
78  16.801192 0.4532456 5.261309e+04
79  13.028931 1.5979574 1.924677e+06
80  10.668955 1.0840667 2.240710e+07
81  10.961278 2.3257595 2.044950e+07
82   8.895630 3.5105186 2.774000e+08
83  16.518106 0.4719416 8.919001e+04
84  13.334976 0.7971067 1.780011e+06
85  13.617060 1.2195412 1.609815e+06
86   9.908475 5.4032295 5.226222e+07
87   8.881836 4.5779464 2.774000e+08
88  16.603536 0.6787417 7.922130e+04
89  17.529083 0.5859315 1.484092e+04
90  15.226498 0.9309800 1.702888e+05
91  11.478334 1.6612984 1.751310e+07
92   9.257033 6.6170833 1.648311e+08
93  16.001562 0.8570780 1.343115e+05
94  14.669926 0.4920395 3.078192e+05
95  17.804495 0.4367456 1.399240e+04
96  18.292847 0.6576827 1.177319e+04
97  10.792565 2.4264054 2.142830e+07
98  15.717618 0.5619723 1.508011e+05
99  14.077875 1.1319117 1.201346e+06
100 12.007622 1.8263940 1.066150e+07

我想生成一个包含所有三个变量的数字。

我目前正在使用

p <- ggplot(dd, aes(V1,V2))
p + geom_point()
p + geom_point(aes(size = V3)) + scale_size_area() + theme_bw() + 
  theme(
    plot.background = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank()) +
  theme(axis.line = element_line(color = 'black')) +
  xlab("V1") +
  ylab("V2")

生产

enter image description here

然而,我并不认为这是最引人注目的数字。有没有人可以建议的另一种数字类型,以使这个数字更好?

2 个答案:

答案 0 :(得分:2)

你有很多选择来改进那个。很难告诉你如何不知道这个情节的目的(出版物,网站,报告......)。

一个非常简单的例子

p + geom_point(aes(size = V3), shape = 21, colour = 'black',
               fill = 'blue', alpha = .5) +
  scale_size(expression('m'^3),
             range = c(3, 10),
             breaks = c(0.1, 5, 20) * 10000000,
             labels = c("low","mid", "high")) +
  theme_bw() + 
  theme(
    plot.background = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    axis.line = element_line(color = 'black')) +
  xlab("V1") +
  ylab("V2")

plot

或具有完全相同outpur的较短版本:

p + geom_point(aes(size = V3), shape = 21, colour = 'black',
               fill = 'blue', alpha = .5) +
  scale_size(expression('m'^3),
             range = c(3, 10),
             breaks = c(0.1, 5, 20) * 10000000,
             labels = c("low","mid", "high")) +
  theme_classic() +
  labs(list(title = 'My plot\n', x = "Var1", y = 'Var2'))

根据用户@blmoore的建议

答案 1 :(得分:2)

恕我直言,散点图非常适合这类数据。但是,此时V1 > 15的点数几乎不可见。结果它失去了信息价值。因此,可以改善情节。我还使代码更紧凑。

代码:

ggplot(dd, aes(V1,V2)) +
  geom_point(aes(size=V3), shape=21, colour="black", fill="red", alpha=.5) + 
  scale_size(expression("m"^3), range = c(2, 12),
             breaks = c(0.01,3,8,15,25) * 10000000,
             labels = c("very low","low","medium","high","very high")) + 
  labs(title="Plot title\n", x="V1", y="V2") +
  theme_classic()

结果: enter image description here