R中重复的字符串

时间:2013-10-11 18:33:31

标签: string r

我在像这样的data.frame中有一个向量

language     
Enlish
English, Spanish
English,English
English, Spanish
English,Chinses,Spanish,English
Spanish,Chinese,Spanish
English,Spanish, Chinese
......

此向量中有超过1000行,包含不同类型的语言。我想删除所有重复的。我希望它看起来像这样:

language
English,
English,Spanish,
English,
English,Spanish
English,Chinese,Spanish
Spanish,Chinese
English,Spanish, Chinese
......

我想用R得到结果。感谢您的帮助!

1 个答案:

答案 0 :(得分:2)

这是一种方法:

您的数据:

language <-readLines(n=7)    
Enlish
English, Spanish
English,English
English, Spanish
English,Chinses,Spanish,English
Spanish,Chinese,Spanish
English,Spanish, Chinese

<强>代码:

lang2 <- strsplit(language, ",\\s*")
## Keep as a list of vectors (more flexible)
lapply(lang2, unique)
## Or paste it together to match your output:
sapply(lapply(lang2, unique), paste, collapse = ",")

## > sapply(lapply(lang2, unique), paste, collapse = ",")
## [1] "Enlish"                  "English,Spanish"        
## [3] "English"                 "English,Spanish"        
## [5] "English,Chinses,Spanish" "Spanish,Chinese"        
## [7] "English,Spanish,Chinese"