根据特定条件制作ggplot2的“geom_point”变量

时间:2017-06-09 19:32:18

标签: r ggplot2

我有一个R脚本,它根据模拟的运行时数据生成绘图。但是,有时在运行期间会出现错误,这会导致null运行时值,并导致图形看起来像运行时间小于实际值。

以下是“数据”数据框中数据的示例:

| Version | TotalMean | TestNum |  Case |
|:-------:|:---------:|:-------:|:-----:|
| 1.0.1   |       350 |       1 | Case1 |
| 1.0.2   |       430 |       2 | Case1 |
| 1.0.4   |       470 |       3 | Case1 |
| 1.0.7   |       445 |       4 | Case1 |
| 1.0.1   |       320 |       1 | Case2 |
| 1.0.2   |       280 |       2 | Case2 |
| 1.0.4   |       450 |       3 | Case2 |
| 1.0.7   |       420 |       4 | Case2 |
| 1.0.1   |       335 |       1 | Case3 |
| 1.0.2   |       415 |       2 | Case3 |
| 1.0.4   |       465 |       3 | Case3 |
| 1.0.7   |       430 |       4 | Case3 |
| 1.0.1   |       310 |       1 | Case4 |
| 1.0.2   |       375 |       2 | Case4 |
| 1.0.4   |       425 |       3 | Case4 |
| 1.0.7   |       410 |       4 | Case4 |

请注意,该表中没有列出空值。这是因为计算TotalMean列的方式永远不会反映出来。但是,在TotalMean计算的数据框中找到了空值。有什么方法可以让geom_point依赖于某个表中是否有空值?也许改变形状和大小?

使用以下代码创建一个工作示例。 Case2中的1.0.2版具有异常值,因为它在原始表中具有空值。

library(ggplot2)

Version <- c("1.0.1","1.0.2","1.0.4","1.0.7","1.0.1","1.0.2","1.0.4","1.0.7","1.0.1","1.0.2","1.0.4","1.0.7","1.0.1","1.0.2","1.0.4","1.0.7")
TotalMean <- c(350,430,470,445,320,280,450,420,335,415,465,430,310,375,425,410)
TestNum <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
Case <- c("Case1","Case1","Case1","Case1","Case2","Case2","Case2","Case2","Case3","Case3","Case3","Case3","Case4","Case4","Case4","Case4")
data <- data.frame(Version,TotalMean,TestNum,Case)
versions <- unique(data[order(data$TestNum), ][,1])
data$Version <- factor(data$Version, levels = versions)

这是我用来创建我使用的图表的代码。 (使用ggplot2)

g<-ggplot(data, aes(color = Case, x = Version, y = TotalMean, group = Case)) + 
    geom_line() + geom_point(shape = 16, size = 2) + coord_cartesian(ylim=c(0,550)) + 
    labs(x="Version", y="Run Time (minutes)") + 
    stat_summary(fun.y=sum, geom="line") +
    theme(plot.title = element_text(face = "bold", size = 16, vjust = 1.5)) + 
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) + 
    theme(axis.title.y = element_text(vjust = 1))
g

1 个答案:

答案 0 :(得分:1)

我制作了如下所示的数据框(结构在底部):

#    Version First_Run Second_Run TestNum  Case 
# 1    1.0.1       350        350       1 Case1 
# 2    1.0.2       430        430       2 Case1 
# 3    1.0.4       470        470       3 Case1 
# 4    1.0.7       445        445       4 Case1 
# 5    1.0.1       320        320       1 Case2 
# 6    1.0.2       560         NA       2 Case2 
# 7    1.0.4       450        450       3 Case2 
# 8    1.0.7       420        420       4 Case2 
# 9    1.0.1       335        335       1 Case3 
# 10   1.0.2       415        415       2 Case3 
# 11   1.0.4       465        465       3 Case3 
# 12   1.0.7       430        430       4 Case3 
# 13   1.0.1       310        310       1 Case4 
# 14   1.0.2       375        375       2 Case4 
# 15   1.0.4       425        425       3 Case4 
# 16   1.0.7       410        410       4 Case4

然后我计算了平均值和形状列:

data$TotalMean <- rowMeans(subset(data, select = c(First_Run, Second_Run)), na.rm = TRUE)

data$shapeflag <- ifelse(is.na(data$First_Run * data$Second_Run), "b", "a")

注意: na.rm = TRUE在计算均值时省略NA,因此您可以在计算中同时调整均值仍然有shapeflag列来标识返回NULL的特定运行。您可以看到它为第六行而不是560返回280

这将是数据集最终的外观:

#    Version First_Run Second_Run TestNum  Case TotalMean shapeflag 
# 1    1.0.1       350        350       1 Case1       350         a 
# 2    1.0.2       430        430       2 Case1       430         a 
# 3    1.0.4       470        470       3 Case1       470         a 
# 4    1.0.7       445        445       4 Case1       445         a 
# 5    1.0.1       320        320       1 Case2       320         a 
# 6    1.0.2       560         NA       2 Case2       560         b 
# 7    1.0.4       450        450       3 Case2       450         a 
# 8    1.0.7       420        420       4 Case2       420         a 
# 9    1.0.1       335        335       1 Case3       335         a 
# 10   1.0.2       415        415       2 Case3       415         a 
# 11   1.0.4       465        465       3 Case3       465         a 
# 12   1.0.7       430        430       4 Case3       430         a 
# 13   1.0.1       310        310       1 Case4       310         a 
# 14   1.0.2       375        375       2 Case4       375         a 
# 15   1.0.4       425        425       3 Case4       425         a 
# 16   1.0.7       410        410       4 Case4       410         a

现在我们可以根据aes中的数据框中的变量设置形状:

g<-ggplot(data, aes(color = Case, x = Version, y = TotalMean, group = Case,
                    shape = shapeflag)) + #Set the shape
  geom_line() + geom_point(size = 3) + coord_cartesian(ylim=c(0,550)) + 
  labs(x="Version", y="Run Time (minutes)") + 
  stat_summary(fun.y=sum, geom="line") +
  theme(plot.title = element_text(face = "bold", size = 16, vjust = 1.5)) + 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) + 
  theme(axis.title.y = element_text(vjust = 1)) +
  scale_shape_discrete(labels=c("norm","null"),name="runs") #Edit the legend

这将是情节:


>g

             https://i.stack.imgur.com/Y4Lce.png

<强> 数据:

data <- 
       structure(list(Version = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 
       3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1.0.1", 
       "1.0.2", "1.0.4", "1.0.7"), class = "factor"), First_Run = c(350, 
       430, 470, 445, 320, 560, 450, 420, 335, 415, 465, 430, 310, 375, 
       425, 410), Second_Run = c(350, 430, 470, 445, 320, NA, 450, 420, 
       335, 415, 465, 430, 310, 375, 425, 410), TestNum = c(1, 2, 3, 
       4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), Case = structure(c(1L, 
       1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("Case1", 
       "Case2", "Case3", "Case4"), class = "factor")), .Names = c("Version", 
       "First_Run", "Second_Run", "TestNum", "Case"), row.names = c(NA, 
       -16L), class = "data.frame")