在所有列

时间:2016-07-16 16:29:05

标签: r dummy-data model.matrix

我有一组五个列的篮球阵容数据,每个列都有相同的因素,如下所示:

head(dat)
              V1             V2            V3            V4              V5
1   MILES,KEATON KINGSLEY,MOSES  BELL,ANTHLON HANNAHS,DUSTY   DURHAM,JABRIL
2   MILES,KEATON KINGSLEY,MOSES  BELL,ANTHLON HANNAHS,DUSTY   DURHAM,JABRIL
3 KINGSLEY,MOSES   BELL,ANTHLON HANNAHS,DUSTY DURHAM,JABRIL   THOMPSON,TREY
4 KINGSLEY,MOSES   BELL,ANTHLON HANNAHS,DUSTY THOMPSON,TREY     BEARD,ANTON
5  THOMPSON,TREY    BEARD,ANTON KOUASSI,WILLY   WHITT,JIMMY WATKINS,MANUALE
6  THOMPSON,TREY    BEARD,ANTON KOUASSI,WILLY   WHITT,JIMMY WATKINS,MANUALE

我想要做的是让每行都是行上显示的当前因子的虚拟编码,如下所示:

MILES,KEATON  KINGSLEY,MOSES  BELL,ANTHLON  HANNAHS,DUSTY  DURHAM,JABRIL THOMPSON,TREY  BEARD,ANTON  KOUASSI,WILLY  WHITT,JIMMY  WATKINS,MANUALE
           1               1             1              1              1             0            0               0             0               0
           1               1             1              1              1             0            0               0             0               0
           0               1             1              1              1             1            0               0             0               0

但是,model.matrix似乎只有一列的范围;它不会让我在多个列中共享整个因子集。根据[本主题] [1]中的一些建议,我尝试了:

df <- as.data.frame(lapply(dat,as.factor))
fList <- lapply(names(df),reformulate,intercept=FALSE)
mList <- lapply(fList,sparse.model.matrix,data=df)
br <- do.call(cBind,mList)
head(br)
6 x 31 sparse Matrix of class "dgCMatrix"
   [[ suppressing 31 column names ‘V1BEARD,ANTON’, ‘V1BELL,ANTHLON’, ‘V1KINGSLEY,MOSES’ ... ]]

1 . . . 1 . . . . 1 . . 1 . . . . . . 1 . . . . . . 1 . . . . .
2 . . . 1 . . . . 1 . . 1 . . . . . . 1 . . . . . . 1 . . . . .
3 . . 1 . . . 1 . . . . . . 1 . . . 1 . . . . . . . . . . . 1 .
4 . . 1 . . . 1 . . . . . . 1 . . . . . . . 1 . . 1 . . . . . .
5 . . . . 1 1 . . . . . . . . 1 . . . . . . . . 1 . . . . . . 1
6 . . . . 1 1 . . . . . . . . 1 . . . . . . . . 1 . . . . . . 1

它结合了列名和因子名。我该怎么办?

1 个答案:

答案 0 :(得分:4)

我们可以尝试使用qdapTools

中的library(qdapTools) mtabulate(as.data.frame(t(df1))) # BELL,ANTHLON DURHAM,JABRIL HANNAHS,DUSTY KINGSLEY,MOSES MILES,KEATON THOMPSON,TREY BEARD,ANTON KOUASSI,WILLY #1 1 1 1 1 1 0 0 0 #2 1 1 1 1 1 0 0 0 #3 1 1 1 1 0 1 0 0 #4 1 0 1 1 0 1 1 0 #5 0 0 0 0 0 1 1 1 #6 0 0 0 0 0 1 1 1 # WATKINS,MANUALE WHITT,JIMMY #1 0 0 #2 0 0 #3 0 0 #4 0 0 #5 1 1 #6 1 1
base R

或使用 table(rep(1:nrow(df1), ncol(df1)), unlist(df1))

'admin@seudominio.com.br' == admin@seudominio.com.br