我正在尝试解决以下问题,但是我很难解释。我想基于两列(颜色和字母)之间的链接分配一个增量值。
Colours <- c("Green","Red","Green","Green","Blue","Red","Brown")
Letters <- c("X","C","Y","A","C","T","P")
df <- data.frame(Colours,Letters)
df
Colours Letters
1 Green X
2 Red C
3 Green Y
4 Green A
5 Blue C
6 Red T
7 Brown P
我将为Group分配一个值,以便所有相同的颜色以及共享相同Letter的任何其他Color都在同一Group中。例如,第2组包括红色和蓝色,因为它们具有与字母C的共享链接。
Group <- c(1,2,1,1,2,2,3)
df <- data.frame(df,Group)
df
Colours Letters Group
1 Green X 1
2 Red C 2
3 Green Y 1
4 Green A 1
5 Blue C 2
6 Red T 2
7 Brown P 3
如果添加了另一行,其中Color = Green和Letter = C,则Group列将变为以下内容。所有绿色将与共享同一字母(如果是红色,则为C)的任何其他颜色(例如,红色)组合在一起。此外,任何与红色共享字母的颜色都将被添加到与绿色相同的组中(蓝色就是这种情况,蓝色与红色共享字母C)。
Colours Letters Group
1 Green X 1
2 Red C 1
3 Green Y 1
4 Green A 1
5 Blue C 1
6 Red T 1
7 Brown P 2
8 Green C 1
有人可以帮忙吗?
答案 0 :(得分:0)
正如上面的@Frank所指出的那样,您描述的是图形问题,因为您希望组标签反映连接的组件-共享字母的颜色。通过将列转换为图形对象,您可以找出单独的组件,然后将它们作为组返回:
Colours <- c("Green","Red","Green","Green","Blue","Red","Brown")
Letters <- c("X","C","Y","A","C","T","P")
df <- data.frame(Colours,Letters)
Group <- c(1,2,1,1,2,2,3)
df <- data.frame(df,Group)
# load the igraph package for working with graphs
library(igraph)
adj.mat <- table(df$Colours, df$Letters) %*% t(table(df$Colours, df$Letters))
# visual inspection makes it clear what the components are
g <- graph_from_adjacency_matrix(adj.mat, mode = 'undirected', diag = F)
plot(g)
# we create a dataframe that matches each color to a component
mdf <- data.frame(Group_test = components(g)$membership,
Colours = names(components(g)$membership))
mdf
#> Group_test Colours
#> Blue 1 Blue
#> Brown 2 Brown
#> Green 3 Green
#> Red 1 Red
# Then we just match them together
dplyr::left_join(df, mdf)
#> Joining, by = "Colours"
#> Colours Letters Group Group_test
#> 1 Green X 1 3
#> 2 Red C 2 1
#> 3 Green Y 1 3
#> 4 Green A 1 3
#> 5 Blue C 2 1
#> 6 Red T 2 1
#> 7 Brown P 3 2
显然,这些组的编号不同,但是颜色的分配方式相似。
我们可以将扩展情况视作健全性检查,在其中添加链接颜色以将组件集减少到2:
# examining the extended case as a check
df2 <- data.frame(Colours = c(Colours, "Green"), Letters = c(Letters, "C"))
df2
#> Colours Letters
#> 1 Green X
#> 2 Red C
#> 3 Green Y
#> 4 Green A
#> 5 Blue C
#> 6 Red T
#> 7 Brown P
#> 8 Green C
# lets wrap the procedure in a function for convenience
getGroup <- function(col, let, plot = FALSE){
adj.mat <- table(col, let) %*% table(let, col)
g <- graph_from_adjacency_matrix(adj.mat, mode = 'undirected',
diag = F)
if (plot) {plot(g)}
comps <- components(g)$membership
mdf <- data.frame(Group = comps, Colours = names(comps))
mdf
}
# we get our desired group key (which we can merge back to the dataframe)
getGroup(df2$Colours, df2$Letters)
#> Group Colours
#> Blue 1 Blue
#> Brown 2 Brown
#> Green 1 Green
#> Red 1 Red
由reprex package(v0.2.1)于2018-11-07创建