在R中来回虚拟变量

时间:2012-12-18 15:08:24

标签: r vectorization dummy-data

所以,我现在已经使用R开了两年了,并且一直试图了解矢量化的这个概念。由于我从调查的多个响应集中处理了很多虚拟变量,我认为用这种情况来学习会很有趣。

这个想法是从多个响应转到虚拟变量(和返回),例如:“在这8个不同的巧克力中,你最喜欢的那些(选择)最多3)?“

有时我们将此编码为虚拟变量( 1 人喜欢“Cote d'Or” 0 person doesn不喜欢它),每个选项有1个变量,有时候是分类( 1 人喜欢“Cote d'Or” 2 此人喜欢“Lindt”,等等),3个选项有3个变量。

所以,基本上我最终会得到一个像

一样的矩阵
1,0,0,1,0,0,1,0

或带有

等行的矩阵
1,4,7

如上所述,这个想法是从一个到另一个。到目前为止,我为每个案例提供了一个循环解决方案,并提供了从虚拟到分类的矢量化解决方案。我将不胜感激任何进一步了解此问题以及分类到虚拟步骤的矢量化解决方案。

DUMMY TO NOT DUMMY

vecOrig<-matrix(0,nrow=18,ncol=8)  # From this one
vecDest<-matrix(0,nrow=18,ncol=3)  # To this one

# Populating the original matrix.
# I'm pretty sure this could have been added to the definition of the matrix, 
# but I kept getting repeated numbers.
# How would you vectorize this?
for (i in 1:length(vecOrig[,1])){               
vecOrig[i,]<-sample(vec)
}

# Now, how would you vectorize this following step... 
for(i in 1:length(vecOrig[,1])){            
  vecDest[i,]<-grep(1,vecOrig[i,])
}

# Vectorized solution, I had to transpose it for some reason.
vecDest2<-t(apply(vecOrig,1,function(x) grep(1,x)))   

不要愚蠢

matOrig<-matrix(0,nrow=18,ncol=3)  # From this one
matDest<-matrix(0,nrow=18,ncol=8)  # To this one.

# We populate the origin matrix. Same thing as the other case. 
for (i in 1:length(matOrig[,1])){         
  matOrig[i,]<-sample(1:8,3,FALSE)
}

# this works, but how to make it vectorized?
for(i in 1:length(matOrig[,1])){          
  for(j in matOrig[i,]){
    matDest[i,j]<-1
  }
}

# Not a clue of how to vectorize this one. 
# The 'model.matrix' solution doesn't look neat.

2 个答案:

答案 0 :(得分:4)

矢量化解决方案:

假人不是假人

vecDest <- t(apply(vecOrig == 1, 1, which))

不是假人(回到原版)

nCol <- 8

vecOrig <- t(apply(vecDest, 1, replace, x = rep(0, nCol), values = 1))

答案 1 :(得分:0)

这可能会为第一部分提供一些内部:

#Create example data
set.seed(42)
vecOrig<-matrix(rbinom(20,1,0.2),nrow=5,ncol=4)

     [,1] [,2] [,3] [,4]
[1,]    1    0    0    1
[2,]    1    0    0    1
[3,]    0    0    1    0
[4,]    1    0    0    0
[5,]    0    0    0    0

请注意,这并不是假设每行中1的数量相等(例如,您写了“选择最多3”)。

#use algebra to create position numbers
vecDest <- t(t(vecOrig)*1:ncol(vecOrig))

     [,1] [,2] [,3] [,4]
[1,]    1    0    0    4
[2,]    1    0    0    4
[3,]    0    0    3    0
[4,]    1    0    0    0
[5,]    0    0    0    0

现在,我们删除零。因此,我们必须将对象转换为列表。

vecDest <- split(t(vecDest), rep(1:nrow(vecDest), each = ncol(vecDest)))
lapply(vecDest,function(x) x[x>0])

$`1`
[1] 1 4

$`2`
[1] 1 4

$`3`
[1] 3

$`4`
[1] 1

$`5`
numeric(0)