匹配和替换R中数据框中的操作

时间:2015-05-14 21:11:23

标签: r replace pattern-matching match vlookup

让我们说我的数据集如下:

John   NA    kaira   
carry  John  NA
maya   Sam   maya
leo    paty  leo
tinker NA    tinker
fabo   leo   maya

我有另一个数据集:

John   1
carry  2
maya   3
leo    4
tinker 5
fabo   6
sam    7
paty   8 
kaira  9

我想将上表(df2)的值与第一个表(df1)相匹配,所以我的最终表(df)如下所示:

1   NA   9   
2   1    NA
3   7    3
4   8    4
5   NA   5
6   4    3

5 个答案:

答案 0 :(得分:6)

也可以

df1[] <- match(unlist(df1), df2$V1)
#   V1 V2 V3
# 1  1 NA  9
# 2  2  1 NA
# 3  3 NA  3
# 4  4  8  4
# 5  5 NA  5
# 6  6  4  3

如果df2中的数字并不总是有序,则abit adjust code将是

df1[] <- df2[match(unlist(df1), df2$V1), 2]

答案 1 :(得分:4)

您可以使用match完成查找:

apply(df1, 2, function(x) df2[,2][match(x, df2[,1])])
     V1 V2 V3
[1,]  1 NA  9
[2,]  2  1 NA
[3,]  3 NA  3
[4,]  4  8  4
[5,]  5 NA  5
[6,]  6  4  3

您会注意到我在第二列中有一个额外的NA值,因为&#34; Sam&#34;从第一个数据框架不匹配&#34; sam&#34;由于区分大小写,从第二个数据帧开始。如果您不关心区分大小写,可以尝试:

apply(df1, 2, function(x) df2[,2][match(tolower(x), tolower(df2[,1]))])
#      V1 V2 V3
# [1,]  1 NA  9
# [2,]  2  1 NA
# [3,]  3  7  3
# [4,]  4  8  4
# [5,]  5 NA  5
# [6,]  6  4  3

答案 2 :(得分:3)

尝试:

library(dplyr)
df1 %>% mutate_each(funs(df2[,2][match(., df2[,1])]))

答案 3 :(得分:1)

您只需使用mapvalues中的plyr

即可
library(plyr)
mapvalues(tolower(as.matrix(df)), tolower(df1$V1), df1$V2)

#     V1  V2  V3 
#[1,] "1" NA  "9"
#[2,] "2" "1" NA 
#[3,] "3" "7" "3"
#[4,] "4" "8" "4"
#[5,] "5" NA  "5"
#[6,] "6" "4" "3"

数据:

df = structure(list(V1 = structure(c(3L, 1L, 5L, 4L, 6L, 2L), .Label = c("carry", 
"fabo", "John", "leo", "maya", "tinker"), class = "factor"), 
V2 = structure(c(NA, 1L, 4L, 3L, NA, 2L), .Label = c("John", 
"leo", "paty", "Sam"), class = "factor"), V3 = structure(c(1L, 
NA, 3L, 2L, 4L, 3L), .Label = c("kaira", "leo", "maya", "tinker"
), class = "factor")), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, 
-6L))

df1 = structure(list(V1 = structure(c(3L, 1L, 6L, 5L, 9L, 2L, 8L, 7L, 
4L), .Label = c("carry", "fabo", "John", "kaira", "leo", "maya", 
"paty", "sam", "tinker"), class = "factor"), V2 = 1:9), .Names = c("V1", 
"V2"), class = "data.frame", row.names = c(NA, -9L))

答案 4 :(得分:1)

如果我们可以删除因素:

df3 <- data.frame(lapply(df, as.character), stringsAsFactors = FALSE)

然后

df3[!is.na(df3)] <- match(df3[!is.na(df3)] , as.character(df1[,1]))