我有一个R脚本,它根据模拟的运行时数据生成绘图。但是,有时在运行期间会出现错误,这会导致null
运行时值,并导致图形看起来像运行时间小于实际值。
以下是“数据”数据框中数据的示例:
| Version | TotalMean | TestNum | Case |
|:-------:|:---------:|:-------:|:-----:|
| 1.0.1 | 350 | 1 | Case1 |
| 1.0.2 | 430 | 2 | Case1 |
| 1.0.4 | 470 | 3 | Case1 |
| 1.0.7 | 445 | 4 | Case1 |
| 1.0.1 | 320 | 1 | Case2 |
| 1.0.2 | 280 | 2 | Case2 |
| 1.0.4 | 450 | 3 | Case2 |
| 1.0.7 | 420 | 4 | Case2 |
| 1.0.1 | 335 | 1 | Case3 |
| 1.0.2 | 415 | 2 | Case3 |
| 1.0.4 | 465 | 3 | Case3 |
| 1.0.7 | 430 | 4 | Case3 |
| 1.0.1 | 310 | 1 | Case4 |
| 1.0.2 | 375 | 2 | Case4 |
| 1.0.4 | 425 | 3 | Case4 |
| 1.0.7 | 410 | 4 | Case4 |
请注意,该表中没有列出空值。这是因为计算TotalMean
列的方式永远不会反映出来。但是,在TotalMean
计算的数据框中找到了空值。有什么方法可以让geom_point
依赖于某个表中是否有空值?也许改变形状和大小?
使用以下代码创建一个工作示例。 Case2中的1.0.2版具有异常值,因为它在原始表中具有空值。
library(ggplot2)
Version <- c("1.0.1","1.0.2","1.0.4","1.0.7","1.0.1","1.0.2","1.0.4","1.0.7","1.0.1","1.0.2","1.0.4","1.0.7","1.0.1","1.0.2","1.0.4","1.0.7")
TotalMean <- c(350,430,470,445,320,280,450,420,335,415,465,430,310,375,425,410)
TestNum <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
Case <- c("Case1","Case1","Case1","Case1","Case2","Case2","Case2","Case2","Case3","Case3","Case3","Case3","Case4","Case4","Case4","Case4")
data <- data.frame(Version,TotalMean,TestNum,Case)
versions <- unique(data[order(data$TestNum), ][,1])
data$Version <- factor(data$Version, levels = versions)
这是我用来创建我使用的图表的代码。 (使用ggplot2)
g<-ggplot(data, aes(color = Case, x = Version, y = TotalMean, group = Case)) +
geom_line() + geom_point(shape = 16, size = 2) + coord_cartesian(ylim=c(0,550)) +
labs(x="Version", y="Run Time (minutes)") +
stat_summary(fun.y=sum, geom="line") +
theme(plot.title = element_text(face = "bold", size = 16, vjust = 1.5)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
theme(axis.title.y = element_text(vjust = 1))
g
答案 0 :(得分:1)
我制作了如下所示的数据框(结构在底部):
# Version First_Run Second_Run TestNum Case
# 1 1.0.1 350 350 1 Case1
# 2 1.0.2 430 430 2 Case1
# 3 1.0.4 470 470 3 Case1
# 4 1.0.7 445 445 4 Case1
# 5 1.0.1 320 320 1 Case2
# 6 1.0.2 560 NA 2 Case2
# 7 1.0.4 450 450 3 Case2
# 8 1.0.7 420 420 4 Case2
# 9 1.0.1 335 335 1 Case3
# 10 1.0.2 415 415 2 Case3
# 11 1.0.4 465 465 3 Case3
# 12 1.0.7 430 430 4 Case3
# 13 1.0.1 310 310 1 Case4
# 14 1.0.2 375 375 2 Case4
# 15 1.0.4 425 425 3 Case4
# 16 1.0.7 410 410 4 Case4
然后我计算了平均值和形状列:
data$TotalMean <- rowMeans(subset(data, select = c(First_Run, Second_Run)), na.rm = TRUE)
data$shapeflag <- ifelse(is.na(data$First_Run * data$Second_Run), "b", "a")
注意: na.rm = TRUE
在计算均值时省略NA
,因此您可以在计算中同时调整均值仍然有shapeflag
列来标识返回NULL
的特定运行。您可以看到它为第六行而不是560
返回280
。
这将是数据集最终的外观:
# Version First_Run Second_Run TestNum Case TotalMean shapeflag
# 1 1.0.1 350 350 1 Case1 350 a
# 2 1.0.2 430 430 2 Case1 430 a
# 3 1.0.4 470 470 3 Case1 470 a
# 4 1.0.7 445 445 4 Case1 445 a
# 5 1.0.1 320 320 1 Case2 320 a
# 6 1.0.2 560 NA 2 Case2 560 b
# 7 1.0.4 450 450 3 Case2 450 a
# 8 1.0.7 420 420 4 Case2 420 a
# 9 1.0.1 335 335 1 Case3 335 a
# 10 1.0.2 415 415 2 Case3 415 a
# 11 1.0.4 465 465 3 Case3 465 a
# 12 1.0.7 430 430 4 Case3 430 a
# 13 1.0.1 310 310 1 Case4 310 a
# 14 1.0.2 375 375 2 Case4 375 a
# 15 1.0.4 425 425 3 Case4 425 a
# 16 1.0.7 410 410 4 Case4 410 a
现在我们可以根据aes
中的数据框中的变量设置形状:
g<-ggplot(data, aes(color = Case, x = Version, y = TotalMean, group = Case,
shape = shapeflag)) + #Set the shape
geom_line() + geom_point(size = 3) + coord_cartesian(ylim=c(0,550)) +
labs(x="Version", y="Run Time (minutes)") +
stat_summary(fun.y=sum, geom="line") +
theme(plot.title = element_text(face = "bold", size = 16, vjust = 1.5)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
theme(axis.title.y = element_text(vjust = 1)) +
scale_shape_discrete(labels=c("norm","null"),name="runs") #Edit the legend
这将是情节:
>g
<强> 数据:的强>
data <-
structure(list(Version = structure(c(1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1.0.1",
"1.0.2", "1.0.4", "1.0.7"), class = "factor"), First_Run = c(350,
430, 470, 445, 320, 560, 450, 420, 335, 415, 465, 430, 310, 375,
425, 410), Second_Run = c(350, 430, 470, 445, 320, NA, 450, 420,
335, 415, 465, 430, 310, 375, 425, 410), TestNum = c(1, 2, 3,
4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), Case = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("Case1",
"Case2", "Case3", "Case4"), class = "factor")), .Names = c("Version",
"First_Run", "Second_Run", "TestNum", "Case"), row.names = c(NA,
-16L), class = "data.frame")