从R中的csv创建列联表

时间:2015-11-11 02:55:56

标签: r ca

我正在使用ca包进行对应分析。我使用author数据来执行分析,这非常合适。

library(ca)
head(author[,1:5])
                               a   b   c   d    e
three daughters (buck)       550 116 147 374 1015
drifters (michener)          515 109 172 311  827
lost world (clark)           590 112 181 265  940
east wind (buck)             557 129 128 343  996
farewell to arms (hemingway) 589  72 129 339  866
sound and fury 7 (faulkner)  541 109 136 228  763

str(author)
 num [1:12, 1:26] 550 515 590 557 589 541 517 592 576 557 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:12] "three daughters (buck)" "drifters (michener)" "lost world (clark)" "east wind (buck)" ...
  ..$ : chr [1:26] "a" "b" "c" "d" ...

ca(author[,1:5])

 Principal inertias (eigenvalues):
           1        2        3        4       
Value      0.008122 0.001307 0.001072 0.000596
Percentage 73.19%   11.78%   9.66%    5.37%   

...

然后我尝试将author数据写为csv并读取csv以再次执行分析。然后ca无效。读取的csv文件的str是不同的,而不是列联表。因此,ca函数会生成错误。

author1 <- read.csv("author.csv")
colnames(author1)[1] <- ""
head(author1[,1:5])
                                 a   b   c   d
1       three daughters (buck) 550 116 147 374
2          drifters (michener) 515 109 172 311
3           lost world (clark) 590 112 181 265
4             east wind (buck) 557 129 128 343
5 farewell to arms (hemingway) 589  72 129 339
6  sound and fury 7 (faulkner) 541 109 136 228

str(author1[,1:5])
'data.frame':   12 obs. of  5 variables:
 $  : Factor w/ 12 levels "asia (michener)",..: 12 2 6 3 4 11 10 9 5 8 ...
 $ a: int  550 515 590 557 589 541 517 592 576 557 ...
 $ b: int  116 109 112 129 72 109 96 151 120 97 ...
 $ c: int  147 172 181 128 129 136 127 251 136 145 ...
 $ d: int  374 311 265 343 339 228 356 238 404 354 ...

ca(author1[,1:5])
Error in sum(N) : invalid 'type' (character) of argument

我想知道是否有一个简单的方法可以将author1转换为来源author

1 个答案:

答案 0 :(得分:2)

作者的第一列实际上是rownames,因此在csv中读取并将第一列的名称更改为“”是问题所在。

这很有效。

library(data.table)
library(dplyr)
library(ca)

head(author[,1:5])

write.csv(author, file="author.csv")
author2 <- read.csv("author.csv")

head(author2[,1:5]) # here to row names are numbers
                             X   a   b   c   d
1       three daughters (buck) 550 116 147 374
2          drifters (michener) 515 109 172 311
3           lost world (clark) 590 112 181 265
4             east wind (buck) 557 129 128 343
5 farewell to arms (hemingway) 589  72 129 339
6  sound and fury 7 (faulkner) 541 109 136 228

# set row names to be first column of the csv
rownames(author2) <- author2$X

# remove the first column
author2 %>% select(-X) -> author2

head(author2[,1:5]) # notice the row names have changed

                               a   b   c   d    e
three daughters (buck)       550 116 147 374 1015
drifters (michener)          515 109 172 311  827
lost world (clark)           590 112 181 265  940
east wind (buck)             557 129 128 343  996
farewell to arms (hemingway) 589  72 129 339  866
sound and fury 7 (faulkner)  541 109 136 228  763