我有一个data.frame
,其中包含两列字符串,如下所示。
nos <- c("JM1", "JM2", "JM3", "JM1", "JM5", "JM45", "JM3", "JM45")
ren <- c("book, vend, spent", "marigold, fortune", "smoke, parchment, smell, book", "mental, past, create", "key, fortune, mask, federal", "tell, warn, slip", "wire, dg333, uv12", "tell, warn, slip, furniture")
d <- data.frame(nos, ren, stringsAsFactors=FALSE)
d
nos ren
1 JM1 book, vend, spent
2 JM2 marigold, fortune
3 JM3 smoke, parchment, smell, book
4 JM1 mental, past, create
5 JM5 key, fortune, mask, federal
6 JM45 tell, warn, slip
7 JM3 wire, dg333, uv12
8 JM45 tell, warn, slip, furniture
我想根据ren
列中的字符串连接nos
列的元素。
例如,在示例数据中,应该合并与JM1关联的两次出现的元素(&#34; book,vend,spent,mental,past,create&#34;)。
此外,与JM45相关联的元素应合并,只保留唯一的单词。 (&#34;告诉,警告,滑倒,家具和#34;)
我想要获得的输出如下所示。
nos1 <- c("JM1", "JM2", "JM3", "JM5", "JM45")
ren1 <- c("book, vend, spent, mental, past, create", "marigold, fortune", "smoke, parchment, smell, book, wire, dg333, uv12", "key, fortune, mask, federal", "tell, warn, slip, furniture")
out <- data.frame(nos1, ren1, stringsAsFactors=FALSE)
out
nos1 ren1
1 JM1 book, vend, spent, mental, past, create
2 JM2 marigold, fortune
3 JM3 smoke, parchment, smell, book, wire, dg333, uv12
4 JM5 key, fortune, mask, federal
5 JM45 tell, warn, slip, furniture
如何在R
中执行此操作?我的原始数据集在data.frame
中有数千个这样的行。
答案 0 :(得分:3)
使用plyr
包你可以这样做
ddply(d, .(nos), summarise, ren1=paste0(ren, collapse=", "))
或者如果您想要ren1
中的唯一值
ddply(d, .(nos), summarise,
paste0(unique(unlist(strsplit(ren, split=", "))), collapse=", "))