我正在尝试创建一个图表,当连续点落在分组因子中的不同组中时,图表上的点不会连接,并且当发生这种情况时,该线应该断开而不会继续。
以下是我使用的数据和代码示例,虽然不能满足我的要求。
species <- c(rep(c("P1","P2","P3","P4","P5","P6","P7","P8"),each=3))
disease <- rep(c("dis1","dis2","dis3"),4)
score <- c(1,1.7,4,2,5,1,3,4,6,2.5,4,8,2,2,6.2,3,6,4,4,6,1,2,7,4.5)
plantdata <- data.frame(species,disease,score)
#add column for grouping factor
plantdata$valid <- ifelse(plantdata$score <=4, "valid","invalid")
plantdata$status <- paste(plantdata$species,plantdata$valid, sep="_")
library(ggplot2)
ggplot(plantdata, aes(x = disease, y = species)) +
geom_point(aes(size=score)) + geom_line(aes(group =status))
从代码中,我得到下面的情节。
从上图中,连接组中各点水平的线相互交叉,例如,参见植物P7。由于(dis1,P7)和(dis2,P7)上的点落在不同的类别中,即使(dis3,P7)与(dis1,P7)在同一组中,我也不希望它们之间有一条线。因此,对于P7的情况,应该没有线连接沿P7的点,因为(dis1,dis2,dis3)的渐进点落在分组因子内的不同组中。
而且,应该绘制线条,仅在分组因子内连接具有“有效”属性的连续点。例如,沿着P8点,也应该没有连接线(dis2,P8)和(dis3,P8),因为我不希望线条连接带有“无效”属性的点。
这里有6种疾病的相同数据更新
> dput(plantdata)
structure(list(species = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L), .Label = c("P1", "P2", "P3", "P4", "P5", "P6", "P7", "P8"), class = "factor"), disease = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), class = "factor", .Label = c("dis1", "dis2", "dis3", "dis4", "dis5", "dis6")), score = c(1, 1.7, 4, 2, 5, 1, 3, 4, 6, 2.5, 4, 8, 2, 2, 6.2, 3, 6, 4, 4, 6, 1, 2, 7, 4.5, 1, 1.7, 4, 2, 5, 1, 3, 4, 6, 2.5, 4, 8, 2, 2, 6.2, 3, 6, 4, 4, 6, 1, 2, 7, 4.5)), .Names = c("species", "disease", "score"), row.names = c(NA, -48L), class = "data.frame")
答案 0 :(得分:0)
使用grepl
查找status
_valid
,然后应用diff
查看disease
之间的状态是否相同。最后,如果状态不同,则绘制通过NA
时
library(data.table)
setDT(plantdata)
# Make sure that data is sorted by species and disease
setkey(plantdata, species, disease)
# Is status between disease same (`SAME == 0`)
plantdata[, SAME := c(0, diff(grepl("_valid", status))), species]
library(ggplot2)
ggplot(plantdata, aes(species, disease)) +
geom_point(aes(size = score)) +
geom_line(aes(y = ifelse(SAME == -1, NA, disease))) +
coord_flip()
修改:我将disease
切换到y轴,以便在绘制时跳过NA
值(应用coord_flip
在x轴上显示它们)