R如何创建唯一值的字典

时间:2016-09-26 04:50:54

标签: r dictionary

我在数据框中有一个包含多个值的列

           fruits
  1   apple,banana
  2 banana,peaches
  3        peaches
  4          mango

有没有办法为水果创建一个独特价值字典 将创建一个具有值的新列水果:

 fruits = apple,banana,peaches,mango

更新:我需要将值作为列而不是仅包含唯一值的列表。这样我就可以创建一个具有以下内容的最终数据框:

          fruits      fruit_apple  fruit_banana  fruit_mango  fruit_peacheas 
 1   apple,banana          1            0             0             0
 2   banana,peaches        0            1             0             1
 3   peaches               0            0             0             1
 4   mango                 0            0             1             0

4 个答案:

答案 0 :(得分:2)

我们可以使用cSplit_e

中的splitstackshape轻松完成此操作
library(splitstackshape)
cSplit_e(df1, "fruits", ",", type = "character", fill = 0)
#          fruits fruits_apple fruits_banana fruits_mango fruits_peaches
#1   apple,banana            1             1            0              0
#2 banana,peaches            0             1            0              1
#3        peaches            0             0            0              1
#4          mango            0             0            1              0

数据

df1 <- structure(list(fruits = c("apple,banana", "banana,peaches", "peaches", 
"mango")), .Names = "fruits", class = "data.frame", row.names = c("1", 
"2", "3", "4"))

答案 1 :(得分:1)

您是否希望新列重复连接列表?对不起,它不是特别清楚。假设情况如此,并且您的data.frame由字符串组成而不是因素;

df <- read.delim(
text="fruits
apple,banana
banana,peaches
peaches
mango", 
sep="\n", 
header=TRUE,
stringsAsFactors=FALSE)
df
#>           fruits
#> 1   apple,banana
#> 2 banana,peaches
#> 3        peaches
#> 4          mango

df$uniquefruits <- paste0(unique(unlist(strsplit(df$fruits, split=","))), collapse=",")
df
#>           fruits               uniquefruits
#> 1   apple,banana apple,banana,peaches,mango
#> 2 banana,peaches apple,banana,peaches,mango
#> 3        peaches apple,banana,peaches,mango
#> 4          mango apple,banana,peaches,mango

或者您的意思是仅从第一个fruits列中获取其他地方未重复的值?

更新:根据评论,我认为这就是你所追求的:

uniquefruits <- unique(unlist(strsplit(df$fruits, split=",")))
uniquefruits
#> [1] "apple"   "banana"  "peaches" "mango"

df2 <- cbind(df, 
             sapply(uniquefruits, 
                    function(y) apply(df, 1, 
                                      function(x) as.integer(y %in% unlist(strsplit(x, split=","))))))
df2
#>           fruits apple banana peaches mango
#> 1   apple,banana     1      1       0     0
#> 2 banana,peaches     0      1       1     0
#> 3        peaches     0      0       1     0
#> 4          mango     0      0       0     1

理论上,您可以使用dplyr执行此操作,但我无法弄清楚如何自动执行rowwise mutate的列处理(任何人都知道如何?)< / p>

library(dplyr)
df %>% rowwise() %>% mutate(apple    = as.integer("apple"   %in% unlist(strsplit(fruits, ","))),
                            banana   = as.integer("banana"  %in% unlist(strsplit(fruits, ","))),
                            peaches  = as.integer("peaches" %in% unlist(strsplit(fruits, ","))),
                            mango    = as.integer("mango"   %in% unlist(strsplit(fruits, ","))))
#> Source: local data frame [4 x 5]
#> Groups: <by row>
#> 
#> # A tibble: 4 x 5
#>           fruits apple banana  peaches mango
#>            <chr> <int>  <int>    <int> <int>
#> 1   apple,banana     1      1        0     0
#> 2 banana,peaches     0      1        1     0
#> 3        peaches     0      0        1     0
#> 4          mango     0      0        0     1

答案 2 :(得分:0)

以基数R:

fruits <- sort(unique(unlist(strsplit(as.character(df$fruits), split=','))))
cols <- as.data.frame(matrix(rep(0, nrow(df)*length(fruits)), ncol=length(fruits)))
names(cols) <- fruits
df <- cbind.data.frame(df, cols)
df <- as.data.frame(t(apply(df, 1, function(x){fruits <- strsplit(x['fruits'], split=','); x[unlist(fruits)] <- 1;x})))

df
          fruits apple banana mango peaches
1   apple,banana     1      1     0       0
2 banana,peaches     0      1     0       1
3        peaches     0      0     0       1
4          mango     0      0     1       0

答案 3 :(得分:-1)

您可以使用以下步骤

1)使用strsplit函数以逗号分割数据帧。

2)将向量的拆分列表拆分为单个向量。

3)然后选择list.fruits字符向量的唯一。

以下是解决方案

# DataFrame of fruits
f <- c("apple,banana","banana,peaches","peaches","mango")
fruits <- as.data.frame(f)
# fruits dataframe
               f
#1   apple,banana
#2 banana,peaches
#3        peaches
#4          mango
list.fruits <- unlist(strsplit(f,split=","))
unique.fruits <- unique(list.fruits)



# Result 
 unique.fruits
[1] "apple"   "banana"  "peaches" "mango"