我有一个.csv表,如下所示:
+----+------------------+
| ID | CODE |
+----+------------------+
| 1 | W002, W103, W111 |
| 2 | W002, W104 |
| 3 | W103, W111, W202 |
| 4 | W202, W103 |
+----+------------------+
对于每个“ WXXX”代码的含义,我都有单独的描述数据。看起来像这样:
+------+--------+
| ID | CODE |
+------+--------+
| W002 | Blue |
| W103 | Red |
| W111 | Green |
| W202 | Orange |
+------+--------+
我想创建一个将合并到一起的数据集,如下所示:
+----+--------------------+
| ID | Code |
+----+--------------------+
| 1 | Blue, Red, Green |
| 2 | Blue, Red |
| 3 | Red, Green, Orange |
| 4 | Orange, Red |
+----+--------------------+
我尝试在excel中运行vlookup,但无法识别每个Wxxx代码。但是,我宁愿在R中运行所有这些,但要明白,左联接也不起作用。
data_codes = read.csv("data.csv", header = TRUE)
data_colors = read.csv("colors.csv", header = TRUE)
data_join= merge(x = data_codes, y = data_colors, by = "ID", all.x = TRUE
实际结果并未显示每个用逗号分隔的Wxxx码
答案 0 :(得分:0)
这只是为了给您一个总体思路。您可以使用separate_rows
,然后使用inner(left)_join
,group_by
和summarize
。
library(dplyr)
library(tidyr)
a <- data.frame(id = 1:4,
code = c("W002, W103, W111",
"W002, W104",
"W103, W111, W202",
"W202, W103"), stringsAsFactors = F)
a2 <- data.frame(id = c("W002", "W103", "W111", "W202"),
code = c("Blue", "Red", "Green", "Orange"),
stringsAsFactors = F)
a <- a %>% separate_rows(code, sep = ', ')
a3 <- inner_join(a, a2, by = c('code' = 'id'))
a3 <- a3 %>% group_by(id) %>% summarise(code = paste0(code.y, collapse = ', '))