我有两个数据框
的 DF1
+-------+---------+
| Id | Title |
+-------+---------+
| 1 | AAA |
| 2 | BBB |
| 3 | CCC |
+-------+---------+
和
DF2
+-------+---------------+------------------------------------+
| Id | Sub | Body |
+-------+---------------+------------------------------------+
| 1 | some sub1 | some mail body AAA some text here |
| 2 | some sub2 | some text here BBB continues here |
| 3 | some sub3 | some text AAA present here |
| 4 | some sub4 | AAA string is present here also |
| 5 | some sub5 | CCC string is present here |
+-------+---------------+------------------------------------+
我想将 df1 中的Title
与 df2 的Body
列相匹配,
如果Body列中存在标题字符串,则两个行都应该连接,输出数据框应该是:
DF3
+----------+---------------+------------------------------------+
| Title | Sub | Body |
+----------+---------------+------------------------------------+
| AAA | some sub1 | some mail body AAA some text here |
| BBB | some sub2 | some text here BBB continues here |
| AAA | some sub3 | some text AAA present here |
| AAA | some sub4 | AAA string is present here also |
| CCC | some sub5 | CCC string is present here |
+----------+---------------+------------------------------------+
答案 0 :(得分:1)
一个解决方案可能看起来像这样,虽然更有经验的R用户可能会得到更好的答案
# set up test data
df1 <- data.frame(stringsAsFactors = F,
id = 1:3,
title = c('AAA', 'BBB', 'CCC'))
df2 <- data.frame(stringsAsFactors = F,
id = 1:5,
sub = c('some sub1', 'some sub2', 'some sub3', 'some sub4', 'some sub5'),
body = c('some mail body AAA some text here',
'some text here BBB continous here',
'some text AAA present here',
'AAA string is present here also',
'CCC string is present here'))
# join data frames
df.list <- lapply(1:nrow(df1), function (idx) cbind(title=df1[idx,2], df2[grepl(df1$title[idx], df2$body), 2:3]))
do.call('rbind', df.list)
将导致以下输出
title sub body
1 AAA some sub1 some mail body AAA some text here
3 AAA some sub3 some text AAA present here
4 AAA some sub4 AAA string is present here also
2 BBB some sub2 some text here BBB continous here
5 CCC some sub5 CCC string is present here
如果我们不能依赖每个标题与df2
中某些行匹配的事实,那么您可能想要做类似这样的事情
# set up test data
df1 <- data.frame(stringsAsFactors = F,
id = 1:4,
title = c('AAA', 'AAA BB', 'BBB', 'CCC'))
df2 <- data.frame(stringsAsFactors = F,
id = 1:5,
sub = c('some sub1', 'some sub2', 'some sub3', 'some sub4', 'some sub5'),
body = c('some mail body AAA some text here',
'some text here BBB continous here',
'some text AAA present here',
'AAA string is present here also',
'CCC string is present here'))
MergeByTitle <- function(title.idx) {
df2.hits <- df2[grepl(df1$title[title.idx], df2$body), 2:3]
if (nrow(df2.hits) > 0)
cbind(title=df1[title.idx,2], df2.hits)
}
# join data frames
df.list <- lapply(1:nrow(df1), MergeByTitle)
do.call('rbind', df.list)