Question

我正在使用公开提供的（我认为）非常酷的泰坦尼克号数据。

有两种主要方法可以将它导入R：

（1）您可以使用内置数据集 Titanic（library(datasets)）或

（2）您可以将其下载为 .csv-file ，例如here

现在，数据是聚合的频率数据。我想将多维列联表转换为个人级数据框。

问题：如果我使用内置数据集，这没问题;但是，如果我使用导入的.csv文件，则它不起作用。这是我收到的错误消息：

rep中的错误（1：nrow（tablevars），计数）：无效的'times'参数In 另外：警告消息：在expand.table（Titanic.table）中：NAs 强制引入

为什么？我错了什么？非常感谢。

R CODE

#required packages
library(datasets)
library(epitools)

#(1) Expansion of built-in data set
data(Titanic)    
Titanic.raw <- Titanic
class(Titanic.raw) # data is stored as "table"
Titanic.expand <- expand.table(Titanic.raw)

#(2) Expansion of imported data set
Titanic.raw <- read.table("Titanic.csv", header=TRUE, sep=",", row.names=1)
class(Titanic.raw) #data is stored as "data.frame"

Titanic.table <- as.table(as.matrix(Titanic.raw)) 
class(Titanic.table) #data is stored as "table"

Titanic.expand <- expand.table(Titanic.table)

Answer 1

我认为你可能想要xtabs：注意Titanic和Titanic.new对象中因子的因子编码是不同的。默认情况下，因子级别具有词典顺序，而Titanic因子中的两个不具有：

 str(Titanic)
 table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
 - attr(*, "dimnames")=List of 4
  ..$ Class   : chr [1:4] "1st" "2nd" "3rd" "Crew"
  ..$ Sex     : chr [1:2] "Male" "Female"
  ..$ Age     : chr [1:2] "Child" "Adult"
  ..$ Survived: chr [1:2] "No" "Yes"

 Titanic.raw <- read.table("~/Downloads/Titanic.csv", header=TRUE, sep=",", row.names=1)

 str( Titanic.new <- 
               xtabs( Freq ~ Class + Sex + Age +Survived, data=Titanic.raw))

 xtabs [1:4, 1:2, 1:2, 1:2] 4 13 89 3 118 154 387 670 0 0 ...
 - attr(*, "dimnames")=List of 4
  ..$ Class   : chr [1:4] "1st" "2nd" "3rd" "Crew"
  ..$ Sex     : chr [1:2] "Female" "Male"
  ..$ Age     : chr [1:2] "Adult" "Child"
  ..$ Survived: chr [1:2] "No" "Yes"
 - attr(*, "class")= chr [1:2] "xtabs" "table"
 - attr(*, "call")= language xtabs(formula = Freq ~ Class + Sex + Age + Survived, data = Titanic.raw)

'xtabs'对象继承自'table'-class，因此您可以使用expand.table函数。

将列联表（.csv-format）导入为“表”而不是R中的“data.frame”

1 个答案: