当连续点落在分组因子中的不同组内时,断开一条线

时间:2017-12-27 13:12:24

标签: r ggplot2

我正在尝试创建一个图表,当连续点落在分组因子中的不同组中时,图表上的点不会连接,并且当发生这种情况时,该线应该断开而不会继续。

以下是我使用的数据和代码示例,虽然不能满足我的要求。

species <- c(rep(c("P1","P2","P3","P4","P5","P6","P7","P8"),each=3))
disease <- rep(c("dis1","dis2","dis3"),4)
score <- c(1,1.7,4,2,5,1,3,4,6,2.5,4,8,2,2,6.2,3,6,4,4,6,1,2,7,4.5)
plantdata <- data.frame(species,disease,score)

#add column for grouping factor
plantdata$valid <- ifelse(plantdata$score <=4, "valid","invalid")
plantdata$status <- paste(plantdata$species,plantdata$valid, sep="_")

library(ggplot2)

ggplot(plantdata, aes(x = disease, y = species)) + 
  geom_point(aes(size=score)) + geom_line(aes(group =status))

从代码中,我得到下面的情节。

enter image description here

从上图中,连接组中各点水平的线相互交叉,例如,参见植物P7。由于(dis1,P7)和(dis2,P7)上的点落在不同的类别中,即使(dis3,P7)与(dis1,P7)在同一组中,我也不希望它们之间有一条线。因此,对于P7的情况,应该没有线连接沿P7的点,因为(dis1,dis2,dis3)的渐进点落在分组因子内的不同组中。

而且,应该绘制线条,仅在分组因子内连接具有“有效”属性的连续点。例如,沿着P8点,也应该没有连接线(dis2,P8)和(dis3,P8),因为我不希望线条连接带有“无效”属性的点。

这里有6种疾病的相同数据更新

> dput(plantdata)
structure(list(species = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L), .Label = c("P1", "P2", "P3", "P4", "P5", "P6", "P7", "P8"), class = "factor"), disease = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), class = "factor", .Label = c("dis1", "dis2", "dis3", "dis4", "dis5", "dis6")), score = c(1, 1.7, 4, 2, 5, 1, 3, 4, 6, 2.5, 4, 8, 2, 2, 6.2, 3, 6, 4, 4, 6, 1, 2, 7, 4.5, 1, 1.7, 4, 2, 5, 1, 3, 4, 6, 2.5, 4, 8, 2, 2, 6.2, 3, 6, 4, 4, 6, 1, 2, 7, 4.5)), .Names = c("species", "disease", "score"), row.names = c(NA, -48L), class = "data.frame")

1 个答案:

答案 0 :(得分:0)

使用grepl查找status _valid,然后应用diff查看disease之间的状态是否相同。最后,如果状态不同,则绘制通过NA

library(data.table)
setDT(plantdata)
# Make sure that data is sorted by species and disease
setkey(plantdata, species, disease)
# Is status between disease same (`SAME == 0`)
plantdata[, SAME := c(0, diff(grepl("_valid", status))), species]

library(ggplot2)
ggplot(plantdata, aes(species, disease)) + 
    geom_point(aes(size = score)) + 
    geom_line(aes(y = ifelse(SAME == -1, NA, disease))) +
    coord_flip()

修改:我将disease切换到y轴,以便在绘制时跳过NA值(应用coord_flip在x轴上显示它们)

enter image description here