连接data.frame列中的字符串而不重复

时间:2014-05-09 07:57:36

标签: r duplicates concatenation vectorization paste

我在data.frame d个字符向量中有两列

t1 <- c("vector, market", "phone34, fax", "material55, animal", "cave", "monday", "fast98")
t2 <- c("vector, market", "phone, fax", "summer, animal", "pan23", "monday", "fast98, ticket")

d <- data.frame(t1, t2, stringsAsFactors=FALSE)

d
                  t1             t2
1     vector, market vector, market
2       phone34, fax     phone, fax
3 material55, animal summer, animal
4               cave          pan23
5             monday         monday
6             fast98 fast98, ticket

我想将两列连接到单个列t3,没有任何重复。

单独使用paste会给我重复。

d $ t3&lt; - paste(d $ t1,d $ t2,sep =“,”)

> d
                  t1             t2                                 t3
1     vector, market vector, market     vector, market, vector, market
2       phone34, fax     phone, fax           phone34, fax, phone, fax
3 material55, animal summer, animal material55, animal, summer, animal
4               cave          pan23                        cave, pan23
5             monday         monday                     monday, monday
6             fast98 fast98, ticket             fast98, fast98, ticket

期望的结果将是

                  t1             t2                                 t3
1     vector, market vector, market                     vector, market
2       phone34, fax     phone, fax                phone34, phone, fax
3 material55, animal summer, animal         material55, animal, summer
4               cave          pan23                        cave, pan23
5             monday         monday                             monday
6             fast98 fast98, ticket                     fast98, ticket

如何在R中有效地执行此操作?有矢量化解决方案吗?

1 个答案:

答案 0 :(得分:3)

您需要strsplit每个向量的每个条目,执行union生成的向量,并paste将它们放在一起:

strsplit(d$t1, split=", ") -> t1s   ## list of vectors
strsplit(d$t2, split=", ") -> t2s   ## list of vectors

# do a union of the elements and paste them together to get a single string
d$t3 <- sapply(1:length(t1), function(x) paste(union(t1s[[x]], t2s[[x]]), collapse=", "))

我希望有所帮助。