我有一个2050行乘202列的大型数据框。我使用命令read.spss()
从SPSS读取数据。这些是因子变量。
data<-read.spss("filename.sav",to.data.frame=TRUE,reencode='utf-8')
dim(data)
[1] "data.frame"
dim(data)
[1] 2050 202
class(data[1,57])
[1] "factor"
class(data$aq21a) # data$aq21a is 57th column
[1] "NULL"
现在我想将第57至61列(data$aq21a,data$aq21b,data$aq21c,data$aq21d,data$aq21e)
添加到新变量aq21
,如下所示
aq21<-rbind(data$aq21a,data$aq21b,data$aq21c,data$aq21d,data$aq21e)
但这并没有给出所需的结果。我想要一个10250乘1矢量
class(aq21)
[1] "matrix"
dim(aq21)
[1] 5 2050
样本数据是
head(data[,57:60])
bq21a bq21b bq21c bq21d
1 Rich / Independent Efficient <NA> <NA>
2 Known / Familiar Efficient <NA> <NA>
3 Relative / Friend Educated / Academic Accountable <NA>
4 Truthfulness Behaviour Good/Great Educated / Academic
5 Behaviour Relative / Friend Educated / Academic Known / Familiar
6 Behaviour Relative / Friend <NA> <NA>
我想要这种类型的结果
bq21a
1 Rich / Independent
2 Known / Familiar
3 Relative / Friend
4 Truthfulness
5 Behaviour
6 Behaviour
7 Efficient
8 Efficient
9 Educated / Academic
10 Behaviour
11 Relative / Friend
... and so on
我得到的结果样本,这不是必需的
aq21[1:5,1:10]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 11 24 19 35 22 22 3 22 NA 2
[2,] 6 6 18 22 19 19 31 31 NA 5
[3,] NA NA 9 2 18 NA 26 NA NA 31
[4,] NA NA NA 18 24 NA NA NA NA NA
[5,] NA NA NA 23 NA NA NA NA NA NA
它有什么问题,我怎样才能得到正确答案?
答案 0 :(得分:1)
假设相关列都是factors
,例如:
data <- structure(list(bq21a = structure(c(4L, 2L, 3L, 5L, 1L, 1L), .Label = c("Behaviour",
"Known / Familiar", "Relative / Friend", "Rich / Independent",
"Truthfulness"), class = "factor"), bq21b = structure(c(3L, 3L,
2L, 1L, 4L, 4L), .Label = c("Behaviour", "Educated / Academic",
"Efficient", "Relative / Friend"), class = "factor"), bq21c = structure(c(NA,
NA, 1L, 3L, 2L, NA), .Label = c("Accountable", "Educated / Academic",
"Good/Great"), class = "factor"), bq21d = structure(c(NA, NA,
NA, 1L, 2L, NA), .Label = c("Educated / Academic", "Known / Familiar"
), class = "factor")), .Names = c("bq21a", "bq21b", "bq21c",
"bq21d"), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6"))
as.numeric(data[,1])
#[1] 4 2 3 5 1 1
as.numeric(data[,2])
#[1] 3 3 2 1 4 4
as.numeric(data[,3])
#[1] NA NA 1 3 2 NA
当您执行rbind
时,您将从factor
转换为numeric
的形式获得结果,如上所示。
rbind(data$bq21a, data$bq21b, data$bq21c, data$bq21d)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 4 2 3 5 1 1
#[2,] 3 3 2 1 4 4
#[3,] NA NA 1 3 2 NA
#[4,] NA NA NA 1 2 NA
您可以在使用stringsAsFactors=FALSE
或read.csv
阅读数据时使用read.table
。如果是这样的话:
dat1 <- data.frame(bq21a=c(rbind(data$bq21a, data$bq21b, data$bq21c, data$bq21d)))
head(dat1)
# bq21a
#1 Rich / Independent
#2 Efficient
#3 <NA>
#4 <NA>
#5 Known / Familiar
#6 Efficient
或尝试:
dat2 <- data.frame(bq21a=c(t(data))) #wouldn't matter if the columns are `factors`
#in your dataset the code would be
#dat2 <- data.frame(bq21a= c(t(data[,57:61[)))
identical(dat1, dat2)
#[1] TRUE
答案 1 :(得分:0)