来自两个不同数据集的ggplot颜色匹配

时间:2018-05-13 15:40:01

标签: r ggplot2

我有以下格式的数据,称为data_1(列车数据)和data_2(测试数据)。它们都包含200个观测值。

我正在尝试绘制数据并匹配一个图形到下一个图形的颜色。这样我就可以看到哪条训练线与测试线匹配。最后,我将使用grid.arrange并排查看这些图,我认为查看哪些训练线与测试线匹配将很有用。 grid.arrange(data_1, data_2, ncol=2)

ggplot(data_1, aes(ID)) + 
  geom_line(aes(y = value, colour = trainauc)) +
  theme(legend.position="none")

ggplot(data_2, aes(ID)) + 
  geom_line(aes(y = value, colour = testauc)) +
  theme(legend.position="none")

为了尝试匹配绘图颜色,我有一些这种格式的数据('data.frame'不是矩阵)。

matching <- matrix(
  c(0.9497, 0.9579, 0.8838, 0.8896),
  nrow = 2,
  ncol = 2)

名为data.frame的{​​{1}}中的值只是matchingdata_1中每个序列100的值(这是每个图的最终值),但是(在我的数据中)数据没有排序,所以我试图将它们匹配在一起。因此,data_1中具有data_2的终值(在seq 100处)的行将与具有终结值0.9497的{​​{1}}的对应图相同。可以使用data_2

中的行匹配它们

希望我有点清楚我想做什么。

_1

0.8838

_2

data.frame

编辑:这些是我在应用于整个数据时得到的图表。我绘制了这些相同的grpahs(没有匹配正确的颜色),它看起来一样,只是水平线没有意义。 enter image description here

编辑2:这是我创建的原始图,没有正确的颜色。 enter image description here

编辑3:这是我遵循的一些代码。

structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
    11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 
    24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 
    37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 
    50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 
    63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 
    76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 
    89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 
    1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 
    15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 
    28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 
    41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 
    54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 
    67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 
    80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L, 90L, 91L, 92L, 
    93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L), trainauc = c("AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1", 
    "AUC.score.0.9579.1"), value = c(0.8403, 0.8755, 0.8899, 0.8967, 
    0.9014, 0.9048, 0.907, 0.9089, 0.9106, 0.9121, 0.913, 0.9143, 
    0.9154, 0.9163, 0.9171, 0.9181, 0.919, 0.9199, 0.9207, 0.9214, 
    0.9222, 0.9229, 0.9232, 0.9237, 0.9242, 0.9247, 0.9253, 0.9257, 
    0.9262, 0.9269, 0.9271, 0.9277, 0.9282, 0.9287, 0.929, 0.9296, 
    0.9301, 0.9307, 0.9311, 0.9316, 0.932, 0.9322, 0.9328, 0.9332, 
    0.9337, 0.934, 0.9344, 0.9346, 0.935, 0.9353, 0.9356, 0.9359, 
    0.9363, 0.9367, 0.9371, 0.9373, 0.9378, 0.9382, 0.9385, 0.9388, 
    0.9391, 0.9394, 0.9397, 0.9399, 0.9402, 0.9406, 0.9408, 0.9411, 
    0.9414, 0.9417, 0.942, 0.9423, 0.9427, 0.9429, 0.9432, 0.9434, 
    0.9436, 0.944, 0.9443, 0.9446, 0.9449, 0.9451, 0.9453, 0.9455, 
    0.9459, 0.9461, 0.9463, 0.9466, 0.9468, 0.9471, 0.9474, 0.9475, 
    0.9479, 0.9481, 0.9484, 0.9486, 0.9488, 0.9491, 0.9494, 0.9497, 
    0.843, 0.8801, 0.89, 0.8968, 0.9016, 0.9051, 0.9078, 0.9098, 
    0.9116, 0.9132, 0.9147, 0.9159, 0.917, 0.9182, 0.9195, 0.9205, 
    0.9212, 0.9221, 0.923, 0.9239, 0.9246, 0.9255, 0.9261, 0.9268, 
    0.9275, 0.9282, 0.929, 0.9297, 0.9303, 0.9309, 0.9315, 0.9321, 
    0.9326, 0.9332, 0.9337, 0.9341, 0.9346, 0.9351, 0.9355, 0.936, 
    0.9364, 0.937, 0.9375, 0.938, 0.9384, 0.9389, 0.9394, 0.9398, 
    0.9402, 0.9406, 0.9411, 0.9416, 0.9419, 0.9423, 0.9428, 0.9432, 
    0.9436, 0.944, 0.9444, 0.9448, 0.9453, 0.9457, 0.946, 0.9464, 
    0.9468, 0.9471, 0.9474, 0.9479, 0.9482, 0.9485, 0.9489, 0.9493, 
    0.9497, 0.95, 0.9504, 0.9507, 0.951, 0.9513, 0.9516, 0.9519, 
    0.9522, 0.9525, 0.9529, 0.9533, 0.9535, 0.9538, 0.9541, 0.9544, 
    0.9548, 0.955, 0.9553, 0.9556, 0.9559, 0.9563, 0.9566, 0.9568, 
    0.9571, 0.9571, 0.9576, 0.9579)), .Names = c("ID", "trainauc", 
    "value"), row.names = 28801:29000, class = "data.frame")

1 个答案:

答案 0 :(得分:1)

主要问题似乎是这个。我们有一个看起来像

的矩阵
head(matching, 3)
#       V1     V2
# 1 0.9241 0.9111
# 2 0.9237 0.9106
# 3 0.9247 0.9110

我们希望它看起来像

                V1               V2
1 AUC.score.0.9241 AUC.score.0.9111
2 AUC.score.0.9237 AUC.score.0.9106
3 AUC.score.0.9247  AUC.score.0.911

同时考虑重复(因此我们可能会AUC.score.0.9241AUC.score.0.9241.1AUC.score.0.9241.2等。)

主要策略是使用splitlapply。首先做左栏:

matching <- as.data.frame(matching)
match_list <- split(matching, matching$V1)
match_out <- lapply(match_list, function(x) {
    x$V1 <- paste("AUC.score", x$V1, 0:(nrow(x) - 1), sep = ".")
    x
})
match_out <- do.call(rbind, match_out)
match_out$V1 <- gsub("\\.0$", "", match_out$V1)

然后是右栏:

match_list <- split(match_out, match_out$V2)
match_out <- lapply(match_list, function(x) {
    x$V2 <- paste("AUC.score", x$V2, 0:(nrow(x) - 1), sep = ".")
    x
})
match_out <- do.call(rbind, match_out)
match_out$V2 <- gsub("\\.0$", "", match_out$V2)

我们会进行一些清理,并附加一组组ID:

rownames(match_out) <- NULL
match_out$group_id <- 1:nrow(match_out)
head(match_out)
#                   V1                 V2 group_id
# 1 AUC.score.0.9999.4   AUC.score.0.8493        1
# 2      AUC.score.1.8   AUC.score.0.8495        2
# 3 AUC.score.0.9999.3   AUC.score.0.8506        3
# 4   AUC.score.0.9999   AUC.score.0.8508        4
# 5      AUC.score.1.6 AUC.score.0.8508.1        5
# 6      AUC.score.1.2   AUC.score.0.8515        6

现在,我们将此数据框与data_1data_2合并:

# Merge
library(dplyr)
data_1 <- left_join(data_1, select(match_out, trainauc = V1, group_id))
data_2 <- left_join(data_2, select(match_out, testauc = V2, group_id))

绘制结果:

bind_rows(train = data_1, test = data_2, .id = "type") %>%
  ggplot(aes(ID)) + 
  geom_line(aes(y = value, colour = factor(group_id))) +
  theme(legend.position="none") +
  facet_wrap("type")

enter image description here