如何获得一列的唯一值?

时间:2017-10-12 12:53:57

标签: r

假设我有以下data.frame:

dt=  data.frame(id=letters[1:6],city = c("A;B","B;D","A;D;G","A;C","F;G","C;D"))

dt id city 1 a A;B 2 b B;D 3 c A;D;G 4 d A;C 5 e F;G 6 f C;D

我希望获得变量城市的独特价值,如下所示:

city=c("A","B","C","D","F","G")

怎么做?

2 个答案:

答案 0 :(得分:2)

更清洁的解决方案是:

dt= data.frame(id=letters[1:6],city = c("A;B","B;D","A;D;G","A;C","F;G","C;D"))

city=strsplit(as.character(dt$city), ";")

city=sort(unique(unlist(city)))

[1] "A" "B" "C" "D" "F" "G"

答案 1 :(得分:1)

数据:

dt=  data.frame(id=letters[1:6],city = c("A;B","B;D","A;D;G","A;C","F;G","C;D"))

> dt
  id  city
1  a   A;B
2  b   B;D
3  c A;D;G
4  d   A;C
5  e   F;G
6  f   C;D

使用city拆分列as.character以转换为字符串:

city <- unlist(strsplit(as.character(dt$city), ";", fixed = T))

> city
 [1] "A" "B" "B" "D" "A" "D" "G" "A" "C" "F" "G" "C" "D"

现在使用uniqueorder来获取输出:

city <- unique(city)

> city
[1] "A" "B" "D" "G" "C" "F"

city <- city[order(city)]

> city
[1] "A" "B" "C" "D" "F" "G"

> dput(city)
c("A", "B", "C", "D", "F", "G")

编辑:使用OPs新数据更新。

Edit2:已更新以省略sapply,因为显然strsplit已向量化。谢谢@Cris!