Question

我有一个2050行乘202列的大型数据框。我使用命令read.spss()从SPSS读取数据。这些是因子变量。

data<-read.spss("filename.sav",to.data.frame=TRUE,reencode='utf-8')
dim(data)
[1] "data.frame"
dim(data)
[1] 2050  202
class(data[1,57])
[1] "factor"
class(data$aq21a) # data$aq21a is 57th column
[1] "NULL"

现在我想将第57至61列(data$aq21a,data$aq21b,data$aq21c,data$aq21d,data$aq21e)添加到新变量aq21，如下所示

aq21<-rbind(data$aq21a,data$aq21b,data$aq21c,data$aq21d,data$aq21e)

但这并没有给出所需的结果。我想要一个10250乘1矢量

class(aq21)
[1] "matrix"
dim(aq21)
[1]    5 2050

样本数据是

head(data[,57:60])
               bq21a               bq21b               bq21c               bq21d
1 Rich / Independent           Efficient                <NA>                <NA>
2   Known / Familiar           Efficient                <NA>                <NA>
3  Relative / Friend Educated / Academic         Accountable                <NA>
4       Truthfulness           Behaviour          Good/Great Educated / Academic
5          Behaviour   Relative / Friend Educated / Academic    Known / Familiar
6          Behaviour   Relative / Friend                <NA>                <NA>

我想要这种类型的结果

               bq21a
1  Rich / Independent
2    Known / Familiar
3   Relative / Friend
4        Truthfulness
5           Behaviour
6           Behaviour
7           Efficient
8           Efficient
9 Educated / Academic
10          Behaviour
11  Relative / Friend
... and so on

我得到的结果样本，这不是必需的

aq21[1:5,1:10]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]   11   24   19   35   22   22    3   22   NA     2
[2,]    6    6   18   22   19   19   31   31   NA     5
[3,]   NA   NA    9    2   18   NA   26   NA   NA    31
[4,]   NA   NA   NA   18   24   NA   NA   NA   NA    NA
[5,]   NA   NA   NA   23   NA   NA   NA   NA   NA    NA

它有什么问题，我怎样才能得到正确答案？

Answer 1

假设相关列都是factors，例如：

   data <- structure(list(bq21a = structure(c(4L, 2L, 3L, 5L, 1L, 1L), .Label = c("Behaviour", 
   "Known / Familiar", "Relative / Friend", "Rich / Independent", 
   "Truthfulness"), class = "factor"), bq21b = structure(c(3L, 3L, 
   2L, 1L, 4L, 4L), .Label = c("Behaviour", "Educated / Academic", 
  "Efficient", "Relative / Friend"), class = "factor"), bq21c = structure(c(NA, 
   NA, 1L, 3L, 2L, NA), .Label = c("Accountable", "Educated / Academic", 
  "Good/Great"), class = "factor"), bq21d = structure(c(NA, NA, 
   NA, 1L, 2L, NA), .Label = c("Educated / Academic", "Known / Familiar"
   ), class = "factor")), .Names = c("bq21a", "bq21b", "bq21c", 
   "bq21d"), class = "data.frame", row.names = c("1", "2", "3", 
   "4", "5", "6"))

   as.numeric(data[,1])
   #[1] 4 2 3 5 1 1
   as.numeric(data[,2])
   #[1] 3 3 2 1 4 4
   as.numeric(data[,3])
   #[1] NA NA  1  3  2 NA

当您执行rbind时，您将从factor转换为numeric的形式获得结果，如上所示。

  rbind(data$bq21a, data$bq21b, data$bq21c, data$bq21d) 
  #    [,1] [,2] [,3] [,4] [,5] [,6]
  #[1,]    4    2    3    5    1    1
  #[2,]    3    3    2    1    4    4
  #[3,]   NA   NA    1    3    2   NA
  #[4,]   NA   NA   NA    1    2   NA

您可以在使用stringsAsFactors=FALSE或read.csv阅读数据时使用read.table。如果是这样的话：

 dat1 <- data.frame(bq21a=c(rbind(data$bq21a, data$bq21b, data$bq21c, data$bq21d)))
 head(dat1)
 #           bq21a
 #1 Rich / Independent
 #2          Efficient
 #3               <NA>
 #4               <NA>
 #5   Known / Familiar
 #6          Efficient

更新

或尝试：

   dat2 <- data.frame(bq21a=c(t(data))) #wouldn't matter if the columns are `factors`
   #in your dataset the code would be
   #dat2 <- data.frame(bq21a= c(t(data[,57:61[)))  

   identical(dat1, dat2)
   #[1] TRUE

Answer 2

如Ananda和akrun所建议的，我使用了以下代码

aq21<-data.frame(Col=unlist(data[,57:61]))

这解决了这个问题。但我仍然没有理解为什么rbind不起作用？

我们是否必须使用unlist rbind才能正常使用？

R将数据框的不同列添加到单个列

2 个答案:

更新