如何在其他列中查找唯一的内容相对分配ID

时间:2017-10-07 13:32:29

标签: r dplyr

我有一个玩具示例来解释我正在尝试的工作:

aski = data.frame(x=c("a","b","c","a","d","d"),y=c("b","a","d","a","b","c"))

我设法为列y分配唯一ID,现在输出如下:

aski2 = data.frame(x=c("a","b","c","a","d","d"),y=c("1","2","3","2","1","4"))

如您所见,col x和y中都存在“b”,我们在col y中分配了id = 1 和“a”,其中id = 2,等等。 如你所见,这些值也存在于col x ..... col x有“a”作为它的第一个元素。“a”也在col y中并被赋值id = 2 所以我也会为col x指定一个id = 2 现在我接下来要做的是在col x中查找这些值,如果它出现在col y中,我将该id分配给它

最终数据框类似

aski3 = data.frame(x=c("2","1","4","2","3","3"),y=c("1","2","3","2","1","4"))

3 个答案:

答案 0 :(得分:2)

无需创建match作为中间人,可能的解决方案是使用lapply# create a vector of the unique values in the order # in which you want them assigned to '1' till '4' v <- unique(aski$y) # convert both columns to integer values with 'match' and 'lapply' aski[] <- lapply(aski, match, v) 来获取字母的数字表示:

> aski
  x y
1 2 1
2 1 2
3 4 3
4 2 2
5 3 1
6 3 4

给出:

aski[] <- lapply(aski, as.character)

如果您想将数字作为字符,您还可以:

{{1}}

答案 1 :(得分:1)

首先,将两列都转换为字符向量。 然后,收集两列中的所有唯一值,以用作因子的级别。

将两列都转换为因子,然后是数字。

aski = data.frame(x=c("a","b","c","a","d","d"),y=c("b","a","d","a","b","c"))

aski$x <- as.character(aski$x)
aski$y <- as.character(aski$y)

lev <- unique(c(aski$y, aski$x))
aski$x <- factor(aski$x, levels=lev)
aski$y <- factor(aski$y, levels=lev)

aski$x <- as.numeric(aski$x)
aski$y <- as.numeric(aski$y)
aski

答案 2 :(得分:1)

来自dplyr的解决方案。我们可以首先通过vec创建一个向量,显示索引和字母之间的关系为unique(aski$y)。完成此步骤后,您可以使用Jaap的lapply解决方案,也可以使用mutata_all中的dplyr,如下所示。

# Create the vector showing the relationship of index and letter 
vec <- unique(aski$y)
# View vec
vec
[1] "b" "a" "d" "c"

library(dplyr)

# Modify all columns
aski2 <- aski %>% mutate_all(funs(match(., vec)))
# View the results
aski2
  x y
1 2 1
2 1 2
3 4 3
4 2 2
5 3 1
6 3 4

数据

aski <- data.frame(x = c("a","b","c","a","d","d"),
                   y = c("b","a","d","a","b","c"),
                   stringsAsFactors = FALSE)