我有时使用ftable
函数纯粹是为了呈现层次类别。但是,有时候,当表格很大时,我想在使用之前进一步对表格进行子集化。
我们说我们从以下开始:
mytable <- ftable(Titanic, row.vars = 1:3)
mytable
## Survived No Yes
## Class Sex Age
## 1st Male Child 0 5
## Adult 118 57
## Female Child 0 1
## Adult 4 140
## 2nd Male Child 0 11
## Adult 154 14
## Female Child 0 13
## Adult 13 80
## 3rd Male Child 35 13
## Adult 387 75
## Female Child 17 14
## Adult 89 76
## Crew Male Child 0 0
## Adult 670 192
## Female Child 0 0
## Adult 3 20
str(mytable)
## ftable [1:16, 1:2] 0 118 0 4 0 154 0 13 35 387 ...
## - attr(*, "row.vars")=List of 3
## ..$ Class: chr [1:4] "1st" "2nd" "3rd" "Crew"
## ..$ Sex : chr [1:2] "Male" "Female"
## ..$ Age : chr [1:2] "Child" "Adult"
## - attr(*, "col.vars")=List of 1
## ..$ Survived: chr [1:2] "No" "Yes"
## NULL
由于没有dimnames
,我无法以与dimnames
对象相同的方式提取数据。例如,我无法直接提取所有&#34; Child&#34;来自&#34; 1st&#34;的值和&#34;第三&#34;类。
我目前的做法是将其转换为table
,进行提取,然后将其转换回ftable
。
示例:
mytable[c("1st", "3rd"), , "Child", ]
## Error: incorrect number of dimensions
## Only the underlying data are seen as having dims
dim(mytable)
## [1] 16 2
## I'm OK with the "Age" column being dropped in this case....
ftable(as.table(mytable)[c("1st", "3rd"), , "Child", ])
## Survived No Yes
## Class Sex
## 1st Male 0 5
## Female 0 1
## 3rd Male 35 13
## Female 17 14
但是,我不喜欢这种方法,因为如果您不小心,整体布局有时会发生变化。将其与以下内容进行比较,这样就不再需要仅对子项进行子集化,并且只需要对那些不能生存的子项进行子集化的要求:
ftable(as.table(mytable)[c("1st", "3rd"), , , "No"])
## Age Child Adult
## Class Sex
## 1st Male 0 118
## Female 0 4
## 3rd Male 35 387
## Female 17 89
我不喜欢行和列的整体布局发生了变化。这是一个经典案例,在提取单个列时必须记住使用drop = FALSE
维护维度:
ftable(as.table(mytable)[c("1st", "3rd"), , , "No", drop = FALSE])
## Survived No
## Class Sex Age
## 1st Male Child 0
## Adult 118
## Female Child 0
## Adult 4
## 3rd Male Child 35
## Adult 387
## Female Child 17
## Adult 89
我知道有许多方式来获取我想要的数据,从原始数据的子集开始然后制作我的ftable
,但是对于这个问题,请让&#39;我认为这是不可能的。
最终目标是让我从ftable
中提取一种保留嵌套&#34;行&#34;的显示格式的方法。层次结构。
还有其他解决方案吗?我们是否可以使用row.vars
和col.vars
属性从ftable
中提取数据并保留其格式?
我目前的方法对分层列也不起作用,所以我希望所提出的解决方案也可以处理这些情况。
示例:
tab2 <- ftable(Titanic, row.vars = 1:2, col.vars = 3:4)
tab2
## Age Child Adult
## Survived No Yes No Yes
## Class Sex
## 1st Male 0 5 118 57
## Female 0 1 4 140
## 2nd Male 0 11 154 14
## Female 0 13 13 80
## 3rd Male 35 13 387 75
## Female 17 14 89 76
## Crew Male 0 0 670 192
## Female 0 0 3 20
注意&#34;年龄&#34;的嵌套和#34;幸存&#34;。
尝试我目前的做法:
ftable(as.table(tab2)[c("1st", "3rd"), , , , drop = FALSE])
## Survived No Yes
## Class Sex Age
## 1st Male Child 0 5
## Adult 118 57
## Female Child 0 1
## Adult 4 140
## 3rd Male Child 35 13
## Adult 387 75
## Female Child 17 14
## Adult 89 76
我可以回到我想要的东西:
ftable(as.table(tab2)[c("1st", "3rd"), , , , drop = FALSE], row.vars = 1:2, col.vars = 3:4)
但我希望能有更直接的东西。
答案 0 :(得分:12)
以下是some help来自Axeman的here所能解决的问题:
replace_empty_arguments <- function(a) {
empty_symbols <- vapply(a, function(x) {
is.symbol(x) && identical("", as.character(x)), 0)
}
a[!!empty_symbols] <- 0
lapply(a, eval)
}
`[.ftable` <- function (inftable, ...) {
if (!class(inftable) %in% "ftable") stop("input is not an ftable")
tblatr <- attributes(inftable)[c("row.vars", "col.vars")]
valslist <- replace_empty_arguments(as.list(match.call()[-(1:2)]))
x <- sapply(valslist, function(x) identical(x, 0))
TAB <- as.table(inftable)
valslist[x] <- dimnames(TAB)[x]
temp <- as.matrix(expand.grid(valslist))
out <- ftable(
`dimnames<-`(`dim<-`(TAB[temp], lengths(valslist)), valslist),
row.vars = seq_along(tblatr[["row.vars"]]),
col.vars = seq_along(tblatr[["col.vars"]]) + length(tblatr[["row.vars"]]))
names(attributes(out)[["row.vars"]]) <- names(tblatr[["row.vars"]])
names(attributes(out)[["col.vars"]]) <- names(tblatr[["col.vars"]])
out
}
尝试使用问题中的示例:
mytable[c("1st", "3rd"), , "Child", ]
## Survived No Yes
## Class Sex Age
## 1st Male Child 0 5
## Female Child 0 1
## 3rd Male Child 35 13
## Female Child 17 14
mytable[c("1st", "3rd"), , , "No"]
## Survived No
## Class Sex Age
## 1st Male Child 0
## Adult 118
## Female Child 0
## Adult 4
## 3rd Male Child 35
## Adult 387
## Female Child 17
## Adult 89
tab2[c("1st", "3rd"), , , ]
## Age Child Adult
## Survived No Yes No Yes
## Class Sex
## 1st Male 0 5 118 57
## Female 0 1 4 140
## 3rd Male 35 13 387 75
## Female 17 14 89 76
答案 1 :(得分:3)
Once the data is aggregated to frequencies by combination of factors as is the case with the Titanic
data set, it is arguably easier to subset the raw data and tabulate it for display rather than manipulating the output object.
I recognize that the OP asks for solutions using ftable
, but with the back and forth in the comments section soliciting other ideas, I thought I'd post a different take on this question because it illustrates a way to simultaneously subset the data and generate the hierarchical structure of the contingency tables without custom functions.
Here is an approach using the tables
package that preserves the hierarchical structure of the Titanic
data, as well as eliminating cells that are empty when we subset the data frame.
First we cast the incoming table as a data frame so we can subset it during the tabular()
function.
library(titanic)
df <- as.data.frame(Titanic)
Then we use tables::tabular()
while subsetting the data in the data=
argument with the extract operator [
, and use DropEmpty()
to avoid printing rows and columns where Freq == 0
. We also use Heading()
to suppress unwanted headings for Freq
and sum
.
tabular((Class * Sex) ~ (Age)*Survived*Heading()*Freq*Heading()*sum*DropEmpty(empty=0),
data=df[df$Class %in% c("1st","3rd") & df$Age=="Child",])
...and the output:
> tabular((Class * Sex) ~ (Age)*Survived*Heading()*Freq*Heading()*sum*DropEmpty(empty=0),
+ data=df[df$Class %in% c("1st","3rd") & df$Age=="Child",])
Age
Child
Survived
Class Sex No Yes
1st Male 0 5
Female 0 1
3rd Male 35 13
Female 17 14
If we remove DropEmpty()
, we replicate the entire tabular structure based on the factor variables in the table.
> # remove DropEmpty() to replicate entire factor structure
> tabular((Class * Sex) ~ (Age)*Survived*Heading()*Freq*Heading()*sum,
+ data=df[df$Class %in% c("1st","3rd") & df$Age=="Child",])
Age
Child Adult
Survived Survived
Class Sex No Yes No Yes
1st Male 0 5 0 0
Female 0 1 0 0
2nd Male 0 0 0 0
Female 0 0 0 0
3rd Male 35 13 0 0
Female 17 14 0 0
Crew Male 0 0 0 0
Female 0 0 0 0
>
Replicating the second and third examples from the OP is also straightforward.
> # second example from question
> tabular((Class * Sex * Age) ~ Survived*Heading()*Freq*Heading()*sum*DropEmpty(empty=0),
+ data=df[df$Class %in% c("1st","3rd") & df$Survived=="No",])
Survived
Class Sex Age No
1st Male Child 0
Adult 118
Female Child 0
Adult 4
3rd Male Child 35
Adult 387
Female Child 17
Adult 89
> # third example from question
> tabular((Class * Sex) ~ (Age)*Survived*Heading()*Freq*Heading()*sum*DropEmpty(empty=0),
+ data=df[df$Class %in% c("1st","3rd"),])
Age
Child Adult
Survived Survived
Class Sex No Yes No Yes
1st Male 0 5 118 57
Female 0 1 4 140
3rd Male 35 13 387 75
Female 17 14 89 76
>