我有2个像这样的数据框
df1 <- data.frame(Colors = c("Yellow","Pink","Green","Blue","White","Red"
,"Cyan","Brown","Violet","Orange","Gray"))
df2 <- data.frame(Colors = c("Yellow,Pink","Green","Gold","White","Red,Cyan,Brown",
"Violet","Magenta","Gray"))
我正在尝试合并这两个数据帧并返回df2中的行,这些行也存在于df1中。我还需要确保
所需的输出是
Colors
Yellow,Pink
Green
White
Red,Cyan,Brown
Violet
Gray
如果我df <- inner_join(df2,df1)
,
然后我没有得到行Yellow,Pink
&amp; Red,Cyan,Brown
我在这里缺少什么?有人能指出我正确的方向吗?
答案 0 :(得分:2)
在每个拆分项目上使用R
的基本pmatch
解决方案:
split_list <- strsplit(as.character(df2$Colors),",")
keep_lgl <- sapply(split_list,function(x) !anyNA(pmatch(x,df1$Colors)))
df2[keep_lgl,,drop=FALSE]
# Colors
# 1 Yellow,Pink
# 2 Green
# 4 White
# 5 Red,Cyan,Brown
# 6 Violet
# 8 Gray
注意:仅当df1
中有所有颜色时才匹配一系列颜色。
一些tidyverse
方法:
library(tidyverse)
df2 %>% mutate(keep=Colors) %>%
separate_rows(Colors) %>%
add_count(keep) %>%
inner_join(df1) %>%
add_count(keep) %>% # doesn't do anything here but important in general
filter(n==nn) %>% # same
distinct(keep) %>%
rename(Colors=keep)
# # A tibble: 6 x 1
# Colors
# <fctr>
# 1 Yellow,Pink
# 2 Green
# 3 White
# 4 Red,Cyan,Brown
# 5 Violet
# 6 Gray
df2 %>% mutate(keep=Colors) %>%
separate_rows(Colors) %>%
left_join(df1 %>% mutate(Colors2=Colors,.)) %>%
group_by(keep) %>%
summarize(filt=anyNA(Colors2)) %>%
filter(!filt) %>%
select(-2)
# # A tibble: 6 x 1
# keep
# <fctr>
# 1 Gray
# 2 Green
# 3 Red,Cyan,Brown
# 4 Violet
# 5 White
# 6 Yellow,Pink
答案 1 :(得分:1)
您可以使用regex_inner_join
个套件中的fuzzyjoin
加入df1
和df2
。最后,从df2
列中选择唯一的行。
library(dplyr)
library(fuzzyjoin)
regex_inner_join(df2, df1, by=c(Colors = "Colors")) %>%
select(Colors = Colors.x) %>% distinct()
# Colors
# 1 Yellow,Pink
# 2 Green
# 3 White
# 4 Red,Cyan,Brown
# 5 Violet
# 6 Gray
# Just to demonstrate, result of joined tables using regex_inner_join. One,
# can work-out to convert data in desired format afterwards.
regex_inner_join(df2, df1, by=c(Colors = "Colors"))
# Colors.x Colors.y
# 1 Yellow,Pink Yellow
# 2 Yellow,Pink Pink
# 3 Green Green
# 4 White White
# 5 Red,Cyan,Brown Red
# 6 Red,Cyan,Brown Cyan
# 7 Red,Cyan,Brown Brown
# 8 Violet Violet
# 9 Gray Gray