如何在不同形式之间转换数据?

时间:2013-11-20 11:20:25

标签: r

Data1是一个网络, the data is data1
http://stats.math.uni-augsburg.de/Mondrian/Data/Titanic.txt

当我得到data1时,我如何得到表(将其命名为data2),如下所示:

, , Age = Child, Survived = No

      Sex
Class  Male Female
  1st     0      0
  2nd     0      0
  3rd    35     17
  Crew    0      0

当我有数据2时,如下:     ,,年龄=儿童,幸存=否

      Sex    
Class  Male Female    
  1st     0      0    
  2nd     0      0    
  3rd    35     17    
  Crew    0      0    

, , Age = Adult, Survived = No    

      Sex    
Class  Male Female    
  1st   118      4    
  2nd   154     13    
  3rd   387     89    
  Crew  670      3    

, , Age = Child, Survived = Yes    

      Sex    
Class  Male Female    
  1st     5      1    
  2nd    11     13    
  3rd    13     14    
  Crew    0      0    

, , Age = Adult, Survived = Yes    

      Sex    
Class  Male Female    
  1st    57    140    
  2nd    14     80    
  3rd    75     76    
  Crew  192     20    

如何将data2转换为data1?

1.将data1转换为data2
我可以做一部分工作。

url <- 'http://stats.math.uni-augsburg.de/Mondrian/Data/Titanic.txt'
data <- read.table(url,T)
data[data$Age=="Child" & data$Survived =="No",][,c(1,3)]

2.将data2转换为data1
不知道该怎么做。

我不想从泰坦尼克号获得泰坦尼克号的子数据 如何从csv文件中获取泰坦尼克号表? 如何从Titanic表中获取csv文件?

当我将泰坦尼克号写入文件时,网络中的数据形式并不相同 http://stats.math.uni-augsburg.de/Mondrian/Data/Titanic.txt
我选择了我写的内容:

    "","Class","Sex","Age","Survived","Freq"    
    "1","1st","Male","Child","No",0    
    "2","2nd","Male","Child","No",0    
    "3","3rd","Male","Child","No",35    
    "4","Crew","Male","Child","No",0    
    "5","1st","Female","Child","No",0    
    "6","2nd","Female","Child","No",0    
    "7","3rd","Female","Child","No",17    
    "8","Crew","Female","Child","No",0    
    "9","1st","Male","Adult","No",118    
    "10","2nd","Male","Adult","No",154    
    "11","3rd","Male","Adult","No",387    
    "12","Crew","Male","Adult","No",670    
    "13","1st","Female","Adult","No",4    
    "14","2nd","Female","Adult","No",13    
    "15","3rd","Female","Adult","No",89    
    "16","Crew","Female","Adult","No",3    
    "17","1st","Male","Child","Yes",5    
    "18","2nd","Male","Child","Yes",11    
    "19","3rd","Male","Child","Yes",13    
    "20","Crew","Male","Child","Yes",0    
    "21","1st","Female","Child","Yes",1    
    "22","2nd","Female","Child","Yes",13    
    "23","3rd","Female","Child","Yes",14    
    "24","Crew","Female","Child","Yes",0    
    "25","1st","Male","Adult","Yes",57    
    "26","2nd","Male","Adult","Yes",14    
    "27","3rd","Male","Adult","Yes",75    
    "28","Crew","Male","Adult","Yes",192    
    "29","1st","Female","Adult","Yes",140    
    "30","2nd","Female","Adult","Yes",80    
    "31","3rd","Female","Adult","Yes",76    
    "32","Crew","Female","Adult","Yes",20    

数据不是我想要的。

2 个答案:

答案 0 :(得分:1)

Titanic是一个“表格”对象,因此您需要稍微探索它以了解您正在查看的内容:

> str(Titanic)
 table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
 - attr(*, "dimnames")=List of 4
  ..$ Class   : chr [1:4] "1st" "2nd" "3rd" "Crew"
  ..$ Sex     : chr [1:2] "Male" "Female"
  ..$ Age     : chr [1:2] "Child" "Adult"
  ..$ Survived: chr [1:2] "No" "Yes"
> dim(Titanic)
[1] 4 2 2 2
> dimnames(Titanic)
$Class
[1] "1st"  "2nd"  "3rd"  "Crew"

$Sex
[1] "Male"   "Female"

$Age
[1] "Child" "Adult"

$Survived
[1] "No"  "Yes"

使用这些dimdimnames来提取所需表格的一部分:

> Titanic[,,'Child','No']
      Sex
Class  Male Female
  1st     0      0
  2nd     0      0
  3rd    35     17
  Crew    0      0

对于您从网上加载数据的数据,您只想将最后一行代码包装在table中:

table(data[data$Age=="Child" & data$Survived =="No",][,c(1,3)])

答案 1 :(得分:0)

也许我误解了你的问题,但似乎你想知道如何指定多维表格中列出的内容的顺序。

如果是这种情况,请尝试此操作(第一行,然后是列,然后是第三维(年龄),然后是第四维(幸存)):

data2 <- table(data[c("Class", "Sex", "Age", "Survived")]) 
## table(data[c(1, 3, 2, 4)])
data2
# , , Age = Adult, Survived = No
# 
#         Sex
# Class    Female Male
#   Crew        3  670
#   First       4  118
#   Second     13  154
#   Third      89  387
# 
# <<SNIP>>
#
#
# , , Age = Child, Survived = Yes
# 
#         Sex
# Class    Female Male
#   Crew        0    0
#   First       1    5
#   Second     13   11
#   Third      14   13

关于问题的第二部分,它听起来像“如何从列表数据中重新创建平面/矩形data.frame。对于这个特定示例,您可以尝试类似:

X <- data.frame(data2)
X <- X[rep(rownames(X), X$Freq), -length(X)]

将重新创建的数据的summary与原始数据的summary进行比较:

summary(X)
#     Class         Sex          Age       Survived  
#  Crew  :885   Female: 470   Adult:2092   No :1490  
#  First :325   Male  :1731   Child: 109   Yes: 711  
#  Second:285                                        
#  Third :706  
summary(data)
#     Class        Age           Sex       Survived  
#  Crew  :885   Adult:2092   Female: 470   No :1490  
#  First :325   Child: 109   Male  :1731   Yes: 711  
#  Second:285                                        
#  Third :706   

然后,我在黑暗中拍摄,因为你的问题不是很清楚。遗憾!