dummification抛出一个错误。'x'必须是'sort.list'的原子

时间:2018-02-03 07:01:13

标签: r data-science dummy-variable

我的str(df)如下所示:

> str(categoricalVar)
'data.frame':   56660 obs. of  10 variables:
 $ FavouriteSource    : Factor w/ 3 levels "App","LF","None": 1 1 3 3 3 1 3 3 3 3 ...
 $ FavouriteSource30  : Factor w/ 3 levels "App","LF","None": 1 1 3 3 3 1 3 3 3 3 ...
 $ FavouriteSource90  : Factor w/ 3 levels "App","LF","None": 3 3 3 3 3 3 3 3 3 3 ...
 $ FavouriteSource180 : Factor w/ 3 levels "App","LF","None": 3 3 3 3 3 3 3 3 3 3 ...
 $ FavouriteSource360 : Factor w/ 3 levels "App","LF","None": 3 3 3 3 3 3 3 3 3 3 ...
 $ Favorite_GameBin   : Factor w/ 594 levels " Team Umizoomi: Street Fair Fix -Up (Explorer)",..: 262 163 388 378 378 220 253 378 378 378 ...
 $ Favorite_GameBin30 : Factor w/ 309 levels "1-2-3 Dora!",..: 191 191 191 191 191 191 191 191 191 191 ...
 $ Favorite_GameBin90 : Factor w/ 332 levels "1-2-3 Dora!",..: 206 206 206 206 206 206 206 206 206 206 ...
 $ Favorite_GameBin180: Factor w/ 363 levels "1-2-3 Dora!",..: 226 226 226 226 226 226 226 226 226 226 ...
 $ Favorite_GameBin360: Factor w/ 449 levels " Team Umizoomi: Street Fair Fix -Up (Explorer)",..: 283 283 283 283 283 283 283 283 283 283 ...
> 

我正在尝试将它们弄模糊,但是,它会抛出如下错误:

> categoricalVar_dummy <- dummy(categoricalVar)
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

我做错了什么?

1 个答案:

答案 0 :(得分:0)

这是使用dummies包的两个解决方案。我无法从您的问题中看到dummy电话是否来自dummies个套餐。无论如何,

首先是一些数据,

categoricalVar <- data.frame(
          FavouriteSource = c('bar', 'foo', 'foo', 'foobar', 'foo', 'foo'),
          FavouriteSource30 = c('A', 'C', 'C', 'B', 'B', 'A')); categoricalVar
#>   FavouriteSource FavouriteSource30
#> 1             bar                 A
#> 2             foo                 C
#> 3             foo                 C
#> 4          foobar                 B
#> 5             foo                 B
#> 6             foo                 A

然后加载dummies

# install.packages(c("dummies"), dependencies = TRUE)
library(dummies)

这里是获取假人的dummy.data.frame()方法,

dummy.data.frame(categoricalVar)
#>   FavouriteSourcebar FavouriteSourcefoo FavouriteSourcefoobar FavouriteSource30A
#> 1                  1                  0                     0                  1
#> 2                  0                  1                     0                  0
#> 3                  0                  1                     0                  0
#> 4                  0                  0                     1                  0
#> 5                  0                  1                     0                  0
#> 6                  0                  1                     0                  1
#>   FavouriteSource30B FavouriteSource30C
#> 1                  0                  0
#> 2                  0                  1
#> 3                  0                  1
#> 4                  1                  0
#> 5                  1                  0
#> 6                  0                  0

as Sathish suggest in the comment above

lapply(categoricalVar, dummy)
#> $FavouriteSource
#>      categoricalVarbar categoricalVarfoo categoricalVarfoobar
#> [1,]                 1                 0                    0
#> [2,]                 0                 1                    0
#> [3,]                 0                 1                    0
#> [4,]                 0                 0                    1
#> [5,]                 0                 1                    0
#> [6,]                 0                 1                    0
#> 
#> $FavouriteSource30
#>      categoricalVarA categoricalVarB categoricalVarC
#> [1,]               1               0               0
#> [2,]               0               0               1
#> [3,]               0               0               1
#> [4,]               0               1               0
#> [5,]               0               1               0
#> [6,]               1               0               0