我在像这样的data.frame中有一个向量
language
Enlish
English, Spanish
English,English
English, Spanish
English,Chinses,Spanish,English
Spanish,Chinese,Spanish
English,Spanish, Chinese
......
此向量中有超过1000行,包含不同类型的语言。我想删除所有重复的。我希望它看起来像这样:
language
English,
English,Spanish,
English,
English,Spanish
English,Chinese,Spanish
Spanish,Chinese
English,Spanish, Chinese
......
我想用R得到结果。感谢您的帮助!
答案 0 :(得分:2)
这是一种方法:
您的数据:
language <-readLines(n=7)
Enlish
English, Spanish
English,English
English, Spanish
English,Chinses,Spanish,English
Spanish,Chinese,Spanish
English,Spanish, Chinese
<强>代码:强>
lang2 <- strsplit(language, ",\\s*")
## Keep as a list of vectors (more flexible)
lapply(lang2, unique)
## Or paste it together to match your output:
sapply(lapply(lang2, unique), paste, collapse = ",")
## > sapply(lapply(lang2, unique), paste, collapse = ",")
## [1] "Enlish" "English,Spanish"
## [3] "English" "English,Spanish"
## [5] "English,Chinses,Spanish" "Spanish,Chinese"
## [7] "English,Spanish,Chinese"