跨两列创建一个组密钥

时间:2018-11-07 11:28:19

标签: r

我正在尝试解决以下问题,但是我很难解释。我想基于两列(颜色和字母)之间的链接分配一个增量值。

Colours <- c("Green","Red","Green","Green","Blue","Red","Brown")
Letters <- c("X","C","Y","A","C","T","P")
df <- data.frame(Colours,Letters)
df

    Colours Letters
1   Green       X
2     Red       C
3   Green       Y
4   Green       A
5    Blue       C
6     Red       T
7   Brown       P

我将为Group分配一个值,以便所有相同的颜色以及共享相同Letter的任何其他Color都在同一Group中。例如,第2组包括红色和蓝色,因为它们具有与字母C的共享链接。

Group <- c(1,2,1,1,2,2,3)
df <- data.frame(df,Group)
df
    Colours Letters Group
1   Green       X     1
2     Red       C     2
3   Green       Y     1
4   Green       A     1
5    Blue       C     2
6     Red       T     2
7   Brown       P     3

如果添加了另一行,其中Color = Green和Letter = C,则Group列将变为以下内容。所有绿色将与共享同一字母(如果是红色,则为C)的任何其他颜色(例如,红色)组合在一起。此外,任何与红色共享字母的颜色都将被添加到与绿色相同的组中(蓝色就是这种情况,蓝色与红色共享字母C)。

  Colours Letters Group
1   Green       X     1
2     Red       C     1
3   Green       Y     1
4   Green       A     1
5    Blue       C     1
6     Red       T     1
7   Brown       P     2
8   Green       C     1

有人可以帮忙吗?

1 个答案:

答案 0 :(得分:0)

正如上面的@Frank所指出的那样,您描述的是图形问题,因为您希望组标签反映连接的组件-共享字母的颜色。通过将列转换为图形对象,您可以找出单独的组件,然后将它们作为组返回:

Colours <- c("Green","Red","Green","Green","Blue","Red","Brown")
Letters <- c("X","C","Y","A","C","T","P")
df <- data.frame(Colours,Letters)

Group <- c(1,2,1,1,2,2,3)
df <- data.frame(df,Group)

# load the igraph package for working with graphs
library(igraph)
adj.mat <- table(df$Colours, df$Letters) %*% t(table(df$Colours, df$Letters))

# visual inspection makes it clear what the components are
g <- graph_from_adjacency_matrix(adj.mat, mode = 'undirected', diag = F)
plot(g)

# we create a dataframe that matches each color to a component
mdf <- data.frame(Group_test = components(g)$membership,
                  Colours = names(components(g)$membership))

mdf
#>       Group_test Colours
#> Blue           1    Blue
#> Brown          2   Brown
#> Green          3   Green
#> Red            1     Red

# Then we just match them together
dplyr::left_join(df, mdf)
#> Joining, by = "Colours"
#>   Colours Letters Group Group_test
#> 1   Green       X     1          3
#> 2     Red       C     2          1
#> 3   Green       Y     1          3
#> 4   Green       A     1          3
#> 5    Blue       C     2          1
#> 6     Red       T     2          1
#> 7   Brown       P     3          2

显然,这些组的编号不同,但是颜色的分配方式相似。

我们可以将扩展情况视作健全性检查,在其中添加链接颜色以将组件集减少到2:

# examining the extended case as a check
df2 <- data.frame(Colours = c(Colours, "Green"), Letters = c(Letters, "C"))
df2
#>   Colours Letters
#> 1   Green       X
#> 2     Red       C
#> 3   Green       Y
#> 4   Green       A
#> 5    Blue       C
#> 6     Red       T
#> 7   Brown       P
#> 8   Green       C

# lets wrap the procedure in a function for convenience
getGroup <- function(col, let, plot = FALSE){
  adj.mat <- table(col, let) %*% table(let, col)
  g <- graph_from_adjacency_matrix(adj.mat, mode = 'undirected',
                                   diag = F)
  if (plot) {plot(g)}
  comps <- components(g)$membership
  mdf <- data.frame(Group = comps, Colours = names(comps))
  mdf
}

# we get our desired group key (which we can merge back to the dataframe)
getGroup(df2$Colours, df2$Letters)
#>       Group Colours
#> Blue      1    Blue
#> Brown     2   Brown
#> Green     1   Green
#> Red       1     Red

reprex package(v0.2.1)于2018-11-07创建