我有以下格式的数据,称为data_1
(列车数据)和data_2
(测试数据)。它们都包含200个观测值。
我正在尝试绘制数据并匹配一个图形到下一个图形的颜色。这样我就可以看到哪条训练线与测试线匹配。最后,我将使用grid.arrange
并排查看这些图,我认为查看哪些训练线与测试线匹配将很有用。 grid.arrange(data_1, data_2, ncol=2)
。
ggplot(data_1, aes(ID)) +
geom_line(aes(y = value, colour = trainauc)) +
theme(legend.position="none")
ggplot(data_2, aes(ID)) +
geom_line(aes(y = value, colour = testauc)) +
theme(legend.position="none")
为了尝试匹配绘图颜色,我有一些这种格式的数据('data.frame'不是矩阵)。
matching <- matrix(
c(0.9497, 0.9579, 0.8838, 0.8896),
nrow = 2,
ncol = 2)
名为data.frame
的{{1}}中的值只是matching
和data_1
中每个序列100的值(这是每个图的最终值),但是(在我的数据中)数据没有排序,所以我试图将它们匹配在一起。因此,data_1中具有data_2
的终值(在seq 100处)的行将与具有终结值0.9497
的{{1}}的对应图相同。可以使用data_2
。
希望我有点清楚我想做什么。
_1
0.8838
_2
data.frame
编辑:这些是我在应用于整个数据时得到的图表。我绘制了这些相同的grpahs(没有匹配正确的颜色),它看起来一样,只是水平线没有意义。
编辑3:这是我遵循的一些代码。
structure(list(ID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L,
24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L,
37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L,
50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L,
63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L,
76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L,
89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L,
15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L,
28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L,
41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L,
54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L,
67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L,
80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L, 90L, 91L, 92L,
93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L), trainauc = c("AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9497.4", "AUC.score.0.9497.4", "AUC.score.0.9497.4",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1", "AUC.score.0.9579.1", "AUC.score.0.9579.1",
"AUC.score.0.9579.1"), value = c(0.8403, 0.8755, 0.8899, 0.8967,
0.9014, 0.9048, 0.907, 0.9089, 0.9106, 0.9121, 0.913, 0.9143,
0.9154, 0.9163, 0.9171, 0.9181, 0.919, 0.9199, 0.9207, 0.9214,
0.9222, 0.9229, 0.9232, 0.9237, 0.9242, 0.9247, 0.9253, 0.9257,
0.9262, 0.9269, 0.9271, 0.9277, 0.9282, 0.9287, 0.929, 0.9296,
0.9301, 0.9307, 0.9311, 0.9316, 0.932, 0.9322, 0.9328, 0.9332,
0.9337, 0.934, 0.9344, 0.9346, 0.935, 0.9353, 0.9356, 0.9359,
0.9363, 0.9367, 0.9371, 0.9373, 0.9378, 0.9382, 0.9385, 0.9388,
0.9391, 0.9394, 0.9397, 0.9399, 0.9402, 0.9406, 0.9408, 0.9411,
0.9414, 0.9417, 0.942, 0.9423, 0.9427, 0.9429, 0.9432, 0.9434,
0.9436, 0.944, 0.9443, 0.9446, 0.9449, 0.9451, 0.9453, 0.9455,
0.9459, 0.9461, 0.9463, 0.9466, 0.9468, 0.9471, 0.9474, 0.9475,
0.9479, 0.9481, 0.9484, 0.9486, 0.9488, 0.9491, 0.9494, 0.9497,
0.843, 0.8801, 0.89, 0.8968, 0.9016, 0.9051, 0.9078, 0.9098,
0.9116, 0.9132, 0.9147, 0.9159, 0.917, 0.9182, 0.9195, 0.9205,
0.9212, 0.9221, 0.923, 0.9239, 0.9246, 0.9255, 0.9261, 0.9268,
0.9275, 0.9282, 0.929, 0.9297, 0.9303, 0.9309, 0.9315, 0.9321,
0.9326, 0.9332, 0.9337, 0.9341, 0.9346, 0.9351, 0.9355, 0.936,
0.9364, 0.937, 0.9375, 0.938, 0.9384, 0.9389, 0.9394, 0.9398,
0.9402, 0.9406, 0.9411, 0.9416, 0.9419, 0.9423, 0.9428, 0.9432,
0.9436, 0.944, 0.9444, 0.9448, 0.9453, 0.9457, 0.946, 0.9464,
0.9468, 0.9471, 0.9474, 0.9479, 0.9482, 0.9485, 0.9489, 0.9493,
0.9497, 0.95, 0.9504, 0.9507, 0.951, 0.9513, 0.9516, 0.9519,
0.9522, 0.9525, 0.9529, 0.9533, 0.9535, 0.9538, 0.9541, 0.9544,
0.9548, 0.955, 0.9553, 0.9556, 0.9559, 0.9563, 0.9566, 0.9568,
0.9571, 0.9571, 0.9576, 0.9579)), .Names = c("ID", "trainauc",
"value"), row.names = 28801:29000, class = "data.frame")
答案 0 :(得分:1)
主要问题似乎是这个。我们有一个看起来像
的矩阵head(matching, 3)
# V1 V2
# 1 0.9241 0.9111
# 2 0.9237 0.9106
# 3 0.9247 0.9110
我们希望它看起来像
V1 V2
1 AUC.score.0.9241 AUC.score.0.9111
2 AUC.score.0.9237 AUC.score.0.9106
3 AUC.score.0.9247 AUC.score.0.911
同时考虑重复(因此我们可能会AUC.score.0.9241
,AUC.score.0.9241.1
,AUC.score.0.9241.2
等。)
主要策略是使用split
和lapply
。首先做左栏:
matching <- as.data.frame(matching)
match_list <- split(matching, matching$V1)
match_out <- lapply(match_list, function(x) {
x$V1 <- paste("AUC.score", x$V1, 0:(nrow(x) - 1), sep = ".")
x
})
match_out <- do.call(rbind, match_out)
match_out$V1 <- gsub("\\.0$", "", match_out$V1)
然后是右栏:
match_list <- split(match_out, match_out$V2)
match_out <- lapply(match_list, function(x) {
x$V2 <- paste("AUC.score", x$V2, 0:(nrow(x) - 1), sep = ".")
x
})
match_out <- do.call(rbind, match_out)
match_out$V2 <- gsub("\\.0$", "", match_out$V2)
我们会进行一些清理,并附加一组组ID:
rownames(match_out) <- NULL
match_out$group_id <- 1:nrow(match_out)
head(match_out)
# V1 V2 group_id
# 1 AUC.score.0.9999.4 AUC.score.0.8493 1
# 2 AUC.score.1.8 AUC.score.0.8495 2
# 3 AUC.score.0.9999.3 AUC.score.0.8506 3
# 4 AUC.score.0.9999 AUC.score.0.8508 4
# 5 AUC.score.1.6 AUC.score.0.8508.1 5
# 6 AUC.score.1.2 AUC.score.0.8515 6
现在,我们将此数据框与data_1
和data_2
合并:
# Merge
library(dplyr)
data_1 <- left_join(data_1, select(match_out, trainauc = V1, group_id))
data_2 <- left_join(data_2, select(match_out, testauc = V2, group_id))
绘制结果:
bind_rows(train = data_1, test = data_2, .id = "type") %>%
ggplot(aes(ID)) +
geom_line(aes(y = value, colour = factor(group_id))) +
theme(legend.position="none") +
facet_wrap("type")