Question

我有以下两个数据框：

 n <- 15000
 key <- sample(1:10, 10)
 dictionary <- data.frame(key = key, value = LETTERS[1:10])

 target_df <- data.frame(code = sample(key, n, replace = TRUE))
 target_df$code[sample(seq_len(n), 10)] <- 0

我想用code中的相应value覆盖dictionary。什么是有效的和可读方式？我用过

find_in_dictionary <- function(x) {
  y <- dictionary[match(x, dictionary[, 1]), 2]
}

target_df$code <- find_in_dictionary(target_df$code)
sum(is.na(target_df$code))

它似乎工作正常，并且可以正确处理不匹配的情况。您有更好的建议吗？

Answer 1

使用sqldf：通过查看基于key的两个value中的left join来映射data.frame和key。

运行此操作之前，您只需更改colnames的{{1}}。

target_df

输出：

colnames(target_df)<-c("key")
head(sqldf("Select t.key,d.value from target_df t LEFT JOIN dictionary d on (t.key=d.key)"))

Answer 2

您需要使用dplyr的{{1}}函数。这是一个SQL连接。

left_join

您还可以使用library(dplyr) library(tidyr) n <- 15000 key <- sample(1:10, 10) dictionary <- data.frame(key = key, value = LETTERS[1:10]) target_df <- data.frame(code = sample(key, n, replace = TRUE)) target_df$code[sample(seq_len(n), 10)] <- 0 target_df %>% arrange(code) %>% left_join(dictionary, by = c("code"="key")) %>% drop_na(.)-> final_df head(final_df) #> code value #> 11 1 I #> 12 1 I #> 13 1 I #> 14 1 I #> 15 1 I #> 16 1 I # final_df without 'order' target_df %>% left_join(dictionary, by = c("code"="key")) %>% drop_na(.) %>% head(.) #> code value #> 1 6 A #> 2 6 A #> 3 8 D #> 4 7 F #> 5 8 D #> 6 9 H final_df %>% select(value) %>% head(.) #> value #> 11 I #> 12 I #> 13 I #> 14 I #> 15 I #> 16 I软件包来获得类似的结果。所以有很多问题。

由reprex软件包（v0.2.0）创建于2018-08-30

给定一列键，请根据字典使用一列字符串覆盖它

2 个答案: