我有两个表T1(3列)和T2(2列)
T1:
Name Age Num
John 20 a, c, b
Lily 19 d, h, e
T2:
Item Num
pen a, c, q, b
pencil d, z, h, e
apple a, c, y
列代码为字符串格式。 我想检查T1 $ Num是否所有数字都在T2 $ Num中,并将相应的T2 $ Item添加到T1。 代码类似于
T1 <- sqldf("SELECT *, T2.Item FROM T1 LEFT JOIN T2 WHERE T1.Num are all contained in T2.Num")
我应该得到
Name Age Num Item
John 20 a, c, b pen
Lily 19 d, h, e pencil
谢谢您的帮助!
答案 0 :(得分:0)
1)使用结尾处“注释”中可重复显示的输入,并假设Num
和{{1 }}(问题中的数据就是这种情况–我们稍后放松这个假设),我们可以使用T1
将T2
转换为replace
模式,然后将其与T1.Num
执行左联接。
like
给予:
T2.Num
如果不是library(sqldf)
sqldf("select T1.*, T2.Item, T2.Num Num2 from T1
left join T2 on T2.Num like '%' || replace(T1.Num, ', ', '%') || '%'")
和 Name Age Num Item Num2
1 John 20 a, c, b pen a, c, q, b
2 Lily 19 d, h, e pencil d, z, h, e
3 Jake 10 a, d <NA> <NA>
中Num
的组件以相同的方式排序,则首先将它们排序如下:
T1
2):该替代方法使用了不带sqldf的dplyr和tidyr。
T2
给予:
library(dplyr)
library(tidyr)
T1x <- T1 %>%
separate_rows(Num) %>%
arrange(Name, Num) %>%
group_by(Name) %>%
summarize(Num = toString(Num)) %>%
ungroup
T2x <- T2 %>%
separate_rows(Num) %>%
arrange(Item, Num) %>%
group_by(Item) %>%
summarize(Num = toString(Num)) %>%
ungroup
sqldf("select T1x.*, T2x.Item, T2x.Num Num2 from T1x
left join T2x on T2x.Num like '%' || replace(T1x.Num, ', ', '%') || '%'")
可重复输入的形式是:
T1Long <- T1 %>%
separate_rows(Num)
T1Long %>%
left_join(T1Long %>% count(Name), by = "Name") %>%
left_join(T2 %>% separate_rows(Num), by = "Num") %>%
group_by(Name, Item, n) %>%
summarize(Num = toString(Num), Count = n()) %>%
ungroup %>%
filter(Count == n) %>%
select(-Count, -n)