使用`ftable`属性提取数据

时间:2017-12-26 06:00:24

标签: r subset

我有时使用ftable函数纯粹是为了呈现层次类别。但是,有时候,当表格很大时,我想在使用之前进一步对表格进行子集化。

我们说我们从以下开始:

mytable <- ftable(Titanic, row.vars = 1:3)
mytable
##                    Survived  No Yes
## Class Sex    Age                   
## 1st   Male   Child            0   5
##              Adult          118  57
##       Female Child            0   1
##              Adult            4 140
## 2nd   Male   Child            0  11
##              Adult          154  14
##       Female Child            0  13
##              Adult           13  80
## 3rd   Male   Child           35  13
##              Adult          387  75
##       Female Child           17  14
##              Adult           89  76
## Crew  Male   Child            0   0
##              Adult          670 192
##       Female Child            0   0
##              Adult            3  20

str(mytable)
##  ftable [1:16, 1:2] 0 118 0 4 0 154 0 13 35 387 ...
##  - attr(*, "row.vars")=List of 3
##   ..$ Class: chr [1:4] "1st" "2nd" "3rd" "Crew"
##   ..$ Sex  : chr [1:2] "Male" "Female"
##   ..$ Age  : chr [1:2] "Child" "Adult"
##  - attr(*, "col.vars")=List of 1
##   ..$ Survived: chr [1:2] "No" "Yes"
## NULL

由于没有dimnames,我无法以与dimnames对象相同的方式提取数据。例如,我无法直接提取所有&#34; Child&#34;来自&#34; 1st&#34;的值和&#34;第三&#34;类。

我目前的做法是将其转换为table,进行提取,然后将其转换回ftable

示例:

mytable[c("1st", "3rd"), , "Child", ]
## Error: incorrect number of dimensions

## Only the underlying data are seen as having dims
dim(mytable)
## [1] 16  2

## I'm OK with the "Age" column being dropped in this case....
ftable(as.table(mytable)[c("1st", "3rd"), , "Child", ])
##              Survived No Yes
## Class Sex                   
## 1st   Male             0   5
##       Female           0   1
## 3rd   Male            35  13
##       Female          17  14

但是,我不喜欢这种方法,因为如果您不小心,整体布局有时会发生变化。将其与以下内容进行比较,这样就不再需要仅对子项进行子集化,并且只需要对那些不能生存的子项进行子集化的要求:

ftable(as.table(mytable)[c("1st", "3rd"), , , "No"])
##              Age Child Adult
## Class Sex                   
## 1st   Male           0   118
##       Female         0     4
## 3rd   Male          35   387
##       Female        17    89

我不喜欢行和列的整体布局发生了变化。这是一个经典案例,在提取单个列时必须记住使用drop = FALSE维护维度:

ftable(as.table(mytable)[c("1st", "3rd"), , , "No", drop = FALSE])
##                    Survived  No
## Class Sex    Age               
## 1st   Male   Child            0
##              Adult          118
##       Female Child            0
##              Adult            4
## 3rd   Male   Child           35
##              Adult          387
##       Female Child           17
##              Adult           89

我知道有许多方式来获取我想要的数据,从原始数据的子集开始然后制作我的ftable,但是对于这个问题,请让&#39;我认为这是不可能的。

最终目标是让我从ftable中提取一种保留嵌套&#34;行&#34;的显示格式的方法。层次结构。

还有其他解决方案吗?我们是否可以使用row.varscol.vars属性从ftable中提取数据并保留其格式?

我目前的方法对分层列也不起作用,所以我希望所提出的解决方案也可以处理这些情况。

示例:

tab2 <- ftable(Titanic, row.vars = 1:2, col.vars = 3:4)
tab2
##              Age      Child     Adult    
##              Survived    No Yes    No Yes
## Class Sex                                
## 1st   Male                0   5   118  57
##       Female              0   1     4 140
## 2nd   Male                0  11   154  14
##       Female              0  13    13  80
## 3rd   Male               35  13   387  75
##       Female             17  14    89  76
## Crew  Male                0   0   670 192
##       Female              0   0     3  20

注意&#34;年龄&#34;的嵌套和#34;幸存&#34;。

尝试我目前的做法:

ftable(as.table(tab2)[c("1st", "3rd"), , , , drop = FALSE])
##                    Survived  No Yes
## Class Sex    Age                   
## 1st   Male   Child            0   5
##              Adult          118  57
##       Female Child            0   1
##              Adult            4 140
## 3rd   Male   Child           35  13
##              Adult          387  75
##       Female Child           17  14
##              Adult           89  76

我可以回到我想要的东西:

ftable(as.table(tab2)[c("1st", "3rd"), , , , drop = FALSE], row.vars = 1:2, col.vars = 3:4)

但我希望能有更直接的东西。

2 个答案:

答案 0 :(得分:12)

以下是some help来自Axemanhere所能解决的问题:

replace_empty_arguments <- function(a) {
  empty_symbols <- vapply(a, function(x) {
    is.symbol(x) && identical("", as.character(x)), 0)
  } 
  a[!!empty_symbols] <- 0
  lapply(a, eval)
}

`[.ftable` <- function (inftable, ...) {
  if (!class(inftable) %in% "ftable") stop("input is not an ftable")
  tblatr <- attributes(inftable)[c("row.vars", "col.vars")]
  valslist <- replace_empty_arguments(as.list(match.call()[-(1:2)]))
  x <- sapply(valslist, function(x) identical(x, 0))
  TAB <- as.table(inftable)
  valslist[x] <- dimnames(TAB)[x]
  temp <- as.matrix(expand.grid(valslist))
  out <- ftable(
    `dimnames<-`(`dim<-`(TAB[temp], lengths(valslist)), valslist),
    row.vars = seq_along(tblatr[["row.vars"]]),
    col.vars = seq_along(tblatr[["col.vars"]]) + length(tblatr[["row.vars"]]))
  names(attributes(out)[["row.vars"]]) <- names(tblatr[["row.vars"]])
  names(attributes(out)[["col.vars"]]) <- names(tblatr[["col.vars"]])
  out
}

尝试使用问题中的示例:

mytable[c("1st", "3rd"), , "Child", ]
##                    Survived No Yes
## Class Sex    Age                  
## 1st   Male   Child           0   5
##       Female Child           0   1
## 3rd   Male   Child          35  13
##       Female Child          17  14

mytable[c("1st", "3rd"), , , "No"]
##                    Survived  No
## Class Sex    Age               
## 1st   Male   Child            0
##              Adult          118
##       Female Child            0
##              Adult            4
## 3rd   Male   Child           35
##              Adult          387
##       Female Child           17
##              Adult           89

tab2[c("1st", "3rd"), , , ]
##              Age      Child     Adult    
##              Survived    No Yes    No Yes
## Class Sex                                
## 1st   Male                0   5   118  57
##       Female              0   1     4 140
## 3rd   Male               35  13   387  75
##       Female             17  14    89  76

答案 1 :(得分:3)

Once the data is aggregated to frequencies by combination of factors as is the case with the Titanic data set, it is arguably easier to subset the raw data and tabulate it for display rather than manipulating the output object.

I recognize that the OP asks for solutions using ftable, but with the back and forth in the comments section soliciting other ideas, I thought I'd post a different take on this question because it illustrates a way to simultaneously subset the data and generate the hierarchical structure of the contingency tables without custom functions.

Here is an approach using the tables package that preserves the hierarchical structure of the Titanic data, as well as eliminating cells that are empty when we subset the data frame.

First we cast the incoming table as a data frame so we can subset it during the tabular() function.

 library(titanic)
 df <- as.data.frame(Titanic)

Then we use tables::tabular() while subsetting the data in the data= argument with the extract operator [, and use DropEmpty() to avoid printing rows and columns where Freq == 0. We also use Heading() to suppress unwanted headings for Freq and sum.

tabular((Class * Sex) ~ (Age)*Survived*Heading()*Freq*Heading()*sum*DropEmpty(empty=0),
        data=df[df$Class %in% c("1st","3rd") & df$Age=="Child",])

...and the output:

> tabular((Class * Sex) ~ (Age)*Survived*Heading()*Freq*Heading()*sum*DropEmpty(empty=0),
+         data=df[df$Class %in% c("1st","3rd") & df$Age=="Child",])

              Age         
              Child       
              Survived    
 Class Sex    No       Yes
 1st   Male    0        5 
       Female  0        1 
 3rd   Male   35       13 
       Female 17       14

If we remove DropEmpty(), we replicate the entire tabular structure based on the factor variables in the table.

> # remove DropEmpty() to replicate entire factor structure
> tabular((Class * Sex) ~ (Age)*Survived*Heading()*Freq*Heading()*sum,
+         data=df[df$Class %in% c("1st","3rd") & df$Age=="Child",])

              Age                      
              Child        Adult       
              Survived     Survived    
 Class Sex    No       Yes No       Yes
 1st   Male    0        5  0        0  
       Female  0        1  0        0  
 2nd   Male    0        0  0        0  
       Female  0        0  0        0  
 3rd   Male   35       13  0        0  
       Female 17       14  0        0  
 Crew  Male    0        0  0        0  
       Female  0        0  0        0  
> 

Replicating the second and third examples from the OP is also straightforward.

> # second example from question
> tabular((Class * Sex * Age) ~ Survived*Heading()*Freq*Heading()*sum*DropEmpty(empty=0),
+         data=df[df$Class %in% c("1st","3rd") & df$Survived=="No",])

                    Survived
 Class Sex    Age   No      
 1st   Male   Child   0     
              Adult 118     
       Female Child   0     
              Adult   4     
 3rd   Male   Child  35     
              Adult 387     
       Female Child  17     
              Adult  89     
> # third example from question 
> tabular((Class * Sex) ~ (Age)*Survived*Heading()*Freq*Heading()*sum*DropEmpty(empty=0),
+         data=df[df$Class %in% c("1st","3rd"),])

              Age                      
              Child        Adult       
              Survived     Survived    
 Class Sex    No       Yes No       Yes
 1st   Male    0        5  118       57
       Female  0        1    4      140
 3rd   Male   35       13  387       75
       Female 17       14   89       76
>