我有两个数据框。我想将df1中的项目与df2中的项目进行匹配,如果存在匹配项,则将在数据框行中的匹配编号之前附加一个字母。我已经写了一些代码,但是我不确定如何前进。任何指针都会有所帮助。谢谢。
这是df1的示例:
S.no number
1 122, apple, 22, banana
2 145, 20, 45
3 212, grapes, 33
4 250, sugar, 43
items to be matched from this dataframe df2:
S.no number
1 122
2 186
3 212
4 250
5 111
6 45
输出DF
S.no number
1 S122, apple, 22, banana
2 145, 20, S45
3 S212, grapes, 33
4 S250, sugar, 43
这是我到目前为止所做的:
df1 <- df1 %>%
mutate(ID = row_number()) %>%
separate_rows(`number`, sep = ',') %>%
left_join(df2, by = "S.no") %>%
group_by(ID) %>%
不确定此后如何继续。
答案 0 :(得分:2)
使用底R
df1$number = gsub(paste0("(.*)(",paste(df2$number,collapse="|"),".*)"),"\\1S\\2",df1$number)
Sno number
1 1 S122, apple, 22, banana
2 2 145, 20, S45
3 3 S212, grapes, 33
4 4 S250, sugar, 43
请注意,这不适用于单行中多次出现的情况。
答案 1 :(得分:2)
这是一种方法
library(tidyverse)
df2 = df2 %>% mutate(number = as.character(number))
df3 = df1 %>%
mutate(ID = row_number(),
number=str_split(number, ",", n=Inf)) %>%
unnest() %>%
left_join(df2,by=c("number")) %>%
mutate(number = ifelse(!is.na(S.no.y),
paste("S", number, sep=""), number)) %>%
group_by(S.no.x) %>%
mutate(number = paste(number, collapse=",")) %>%
distinct(S.no.x, .keep_all=T) %>%
select(S.No =S.no.x, number)
答案 2 :(得分:1)
使用dplyr
和tidyr
,我们可以先将separate_rows
和df1
中的left_join
与df2
,paste
{{ 1}}至"S"
,它们具有匹配项,然后再次对其进行汇总。
number
数据
library(dplyr)
library(tidyr)
df1 %>%
separate_rows(number) %>%
left_join(df2 %>% mutate(number = as.character(number)), by = "number") %>%
mutate(number = ifelse(is.na(Sno.y), number, paste0("S", number))) %>%
select(-Sno.y) %>%
group_by(Sno.x) %>%
summarise(number = toString(number))
# Sno.x number
# <int> <chr>
#1 1 S122, apple, 22, banana
#2 2 145, 20, S45
#3 3 S212, grapes, 33
#4 4 S250, sugar, 43