我有一个玩具示例来解释我正在尝试的工作:
aski = data.frame(x=c("a","b","c","a","d","d"),y=c("b","a","d","a","b","c"))
我设法为列y分配唯一ID,现在输出如下:
aski2 = data.frame(x=c("a","b","c","a","d","d"),y=c("1","2","3","2","1","4"))
如您所见,col x和y中都存在“b”,我们在col y中分配了id = 1 和“a”,其中id = 2,等等。 如你所见,这些值也存在于col x ..... col x有“a”作为它的第一个元素。“a”也在col y中并被赋值id = 2 所以我也会为col x指定一个id = 2 现在我接下来要做的是在col x中查找这些值,如果它出现在col y中,我将该id分配给它
最终数据框类似
aski3 = data.frame(x=c("2","1","4","2","3","3"),y=c("1","2","3","2","1","4"))
答案 0 :(得分:2)
无需创建match
作为中间人,可能的解决方案是使用lapply
和# create a vector of the unique values in the order
# in which you want them assigned to '1' till '4'
v <- unique(aski$y)
# convert both columns to integer values with 'match' and 'lapply'
aski[] <- lapply(aski, match, v)
来获取字母的数字表示:
> aski
x y
1 2 1
2 1 2
3 4 3
4 2 2
5 3 1
6 3 4
给出:
aski[] <- lapply(aski, as.character)
如果您想将数字作为字符,您还可以:
{{1}}
答案 1 :(得分:1)
首先,将两列都转换为字符向量。 然后,收集两列中的所有唯一值,以用作因子的级别。
将两列都转换为因子,然后是数字。
aski = data.frame(x=c("a","b","c","a","d","d"),y=c("b","a","d","a","b","c"))
aski$x <- as.character(aski$x)
aski$y <- as.character(aski$y)
lev <- unique(c(aski$y, aski$x))
aski$x <- factor(aski$x, levels=lev)
aski$y <- factor(aski$y, levels=lev)
aski$x <- as.numeric(aski$x)
aski$y <- as.numeric(aski$y)
aski
答案 2 :(得分:1)
来自dplyr
的解决方案。我们可以首先通过vec
创建一个向量,显示索引和字母之间的关系为unique(aski$y)
。完成此步骤后,您可以使用Jaap的lapply
解决方案,也可以使用mutata_all
中的dplyr
,如下所示。
# Create the vector showing the relationship of index and letter
vec <- unique(aski$y)
# View vec
vec
[1] "b" "a" "d" "c"
library(dplyr)
# Modify all columns
aski2 <- aski %>% mutate_all(funs(match(., vec)))
# View the results
aski2
x y
1 2 1
2 1 2
3 4 3
4 2 2
5 3 1
6 3 4
数据强>
aski <- data.frame(x = c("a","b","c","a","d","d"),
y = c("b","a","d","a","b","c"),
stringsAsFactors = FALSE)