根据数据框中另一列中的值连接列中的字符串

时间:2014-05-02 12:08:54

标签: string r dataframe concatenation

我有一个data.frame,其中包含两列字符串,如下所示。

nos <- c("JM1", "JM2", "JM3", "JM1", "JM5", "JM45", "JM3", "JM45")
ren <- c("book, vend, spent", "marigold, fortune", "smoke, parchment, smell, book", "mental, past, create", "key, fortune, mask, federal", "tell, warn, slip", "wire, dg333, uv12", "tell, warn, slip, furniture")
d <- data.frame(nos, ren, stringsAsFactors=FALSE)

d
   nos                           ren
1  JM1             book, vend, spent
2  JM2             marigold, fortune
3  JM3 smoke, parchment, smell, book
4  JM1          mental, past, create
5  JM5   key, fortune, mask, federal
6 JM45              tell, warn, slip
7  JM3             wire, dg333, uv12
8 JM45   tell, warn, slip, furniture

我想根据ren列中的字符串连接nos列的元素。

例如,在示例数据中,应该合并与JM1关联的两次出现的元素(&#34; book,vend,spent,mental,past,create&#34;)。

此外,与JM45相关联的元素应合并,只保留唯一的单词。 (&#34;告诉,警告,滑倒,家具和#34;)

我想要获得的输出如下所示。

nos1 <- c("JM1", "JM2", "JM3", "JM5", "JM45")
ren1 <- c("book, vend, spent, mental, past, create", "marigold, fortune", "smoke, parchment, smell, book, wire, dg333, uv12", "key, fortune, mask, federal", "tell, warn, slip, furniture")
out <- data.frame(nos1, ren1, stringsAsFactors=FALSE)

out
  nos1                                             ren1
1  JM1          book, vend, spent, mental, past, create
2  JM2                                marigold, fortune
3  JM3 smoke, parchment, smell, book, wire, dg333, uv12
4  JM5                      key, fortune, mask, federal
5 JM45                      tell, warn, slip, furniture

如何在R中执行此操作?我的原始数据集在data.frame中有数千个这样的行。

1 个答案:

答案 0 :(得分:3)

使用plyr包你可以这样做

ddply(d, .(nos), summarise, ren1=paste0(ren, collapse=", "))

或者如果您想要ren1中的唯一值

ddply(d, .(nos), summarise, 
      paste0(unique(unlist(strsplit(ren, split=", "))), collapse=", "))