我有两个数据帧(DF1和DF2)
DF1 <- as.data.frame(c("A, B","C","A","C, D"))
names(DF1) <- c("parties")
DF1
parties
A, B
C
A
C, D
B <- as.data.frame(c(LETTERS[1:10]))
C <- as.data.frame(1:10)
DF2 <- bind_cols(B,C)
names(DF2) <- c("party","party.number")
。 DF2
party party.number
A 1
B 2
C 3
D 4
E 5
F 6
G 7
H 8
I 9
J 10
所需的结果应该是DF1中的一个附加列,其中包含DF1中每行从DF2获取的参与号。
期望的结果(基于DF1):
parties party.numbers
A, B 1, 2
C 3
A 1
C, D 3, 4
我强烈怀疑答案涉及str_match
(DF1 $ party,DF2 $ party.number)或类似的正则表达式,但我无法弄清楚如何放两个(或更多) )聚会号码在同一行(DF2 $ party.numbers)。
答案 0 :(得分:1)
一个选项是gsubfn
,将模式匹配为大写字母,因为替换使用键/值list
library(gsubfn)
DF1$party.numbers <- gsubfn("[A-Z]", setNames(as.list(DF2$party.number),
DF2$party), as.character(DF1$parties))
DF1
# parties party.numbers
#1 A, B 1, 2
#2 C 3
#3 A 1
#4 C, D 3, 4
答案 1 :(得分:1)
使用tidyverse
的替代解决方案。您可以重塑DF1
每行一个字符串,然后加入DF2
,然后重新塑造回初始表单:
library(tidyverse)
DF1 <- as.data.frame(c("A, B","C","A","C, D"))
names(DF1) <- c("parties")
B <- as.data.frame(c(LETTERS[1:10]))
C <- as.data.frame(1:10)
DF2 <- bind_cols(B,C)
names(DF2) <- c("party","party.number")
DF1 %>%
group_by(id = row_number()) %>%
separate_rows(parties) %>%
left_join(DF2, by=c("parties"="party")) %>%
summarise(parties = paste(parties, collapse = ", "),
party.numbers = paste(party.number, collapse = ", ")) %>%
select(-id)
# # A tibble: 4 x 2
# parties party.numbers
# <chr> <chr>
# 1 A, B 1, 2
# 2 C 3
# 3 A 1
# 4 C, D 3, 4