我正在使用ftable创建一个平坦的列联表。但是,当我对列联表进行子集化时,R会删除行名和列名。有没有办法对表进行子集,使行和列名保留在子集表中?这是一个例子:
# Create fake data
Group1 = sample(LETTERS[1:3], 20, replace=TRUE)
Group2 = sample(letters[1:3], 20, replace=TRUE)
Year = sample(c("2010","2011","2012"), 20, replace=TRUE)
df1 = data.frame(Group1, Group2, Year)
# Create flat contingency table with column margin
table1 = ftable(addmargins(table(df1$Group1, df1$Group2, df1$Year), margin=3))
# Select rows with sum greater than 2
table2 = table1[table1[ ,4] > 2, ]
> table1
2010 2011 2012 Sum
A a 0 1 2 3
b 2 1 0 3
c 0 0 0 0
B a 0 1 1 2
b 2 0 0 2
c 1 0 1 2
C a 0 1 0 1
b 1 0 2 3
c 3 0 1 4
> table2
[,1] [,2] [,3] [,4]
[1,] 0 1 2 3
[2,] 2 1 0 3
[3,] 1 0 2 3
[4,] 3 0 1 4
注意R如何将子集化表转换为矩阵,删除列名和行名的两个级别。如何在子集表中保留ftable结构?
答案 0 :(得分:4)
结果将不再是ftable
个对象,
因为缺少一些组合。
但是你可以使用矩阵代替行,列名称。
ftable_names <- function(x, which="row.vars") {
# Only tested in dimensions 1 and 2
rows <- as.vector(Reduce(
function(u,v) t(outer(as.vector(u),as.vector(v),paste)),
attr(x, which),
""
))
}
i <- table1[ ,4] > 2
table2 <- table1[i,]
rownames(table2) <- ftable_names(table1, "row.vars")[i]
colnames(table2) <- ftable_names(table1, "col.vars")
table2
# 2010 2011 2012 Sum
# A a 1 2 0 3
# A c 0 0 3 3
# B c 0 3 0 3
# C a 3 1 1 5
答案 1 :(得分:4)
考虑使用频率数据帧。这是一个更好的数据结构,特别是如果你要过滤它。这是使用reshape包构建一个的方法。
# cast the data into a data.frame
library(reshape)
df1$Freq <- 1
df2 <- cast(df1, Group1 + Group2 ~ Year, fun = sum, value = "Freq")
df2
# Group1 Group2 2010 2011 2012
# 1 A a 0 0 1
# 2 A b 1 1 3
# 3 A c 0 0 1
# 4 B a 1 2 0
# 5 B b 1 1 0
# 6 B c 0 0 1
# 7 C a 2 0 1
# 8 C b 2 0 0
# 9 C c 0 0 2
# add a column for the `Sum` of frequencies over the years
df2 <- within(df2, Sum <- `2010` + `2011` + `2012`)
df2
# Group1 Group2 2010 2011 2012 Sum
# 1 A a 0 0 1 1
# 2 A b 1 1 3 5
# 3 A c 0 0 1 1
# 4 B a 1 2 0 3
# 5 B b 1 1 0 2
# 6 B c 0 0 1 1
# 7 C a 2 0 1 3
# 8 C b 2 0 0 2
# 9 C c 0 0 2 2
df2[df2$Sum > 2, ]
# Group1 Group2 2010 2011 2012 Sum
# 2 A b 1 1 3 5
# 4 B a 1 2 0 3
# 7 C a 2 0 1 3
答案 2 :(得分:3)
ftable
创建'平面'列联表[by] ...将数据重新排列为[2D]矩阵。因此,只需使用as.matrix
在子集化之前将数据转换为矩阵(如果直接使用as.table
,数据将返回其更高维度的结构)。
# Create flat contingency table with column margin and variable names
table1 <- ftable(addmargins(table(Group1 = df1$Group1,
Group2 = df1$Group2,
Year = df1$Year), margin=3))
# Convert to matrix
mat1 <- as.matrix(table1)
mat2 <- mat1[mat1[ ,4] > 2, ]
mat2
> mat2
Year
Group1_Group2 2010 2011 2012 Sum
A_b 3 0 0 3
A_c 0 2 3 5
B_b 2 0 1 3
如果您真的不喜欢&#34; _&#34;,请使用gsub
替换。
dimnames(mat2) <- rapply(dimnames(mat2), gsub, pattern = "_", replacement = " ", how = "replace")
或者使用dplyr
和tidyr
包来提高代码的灵活性和可读性:
library(dplyr)
library(tidyr)
df1 %>%
group_by(Group1, Group2, Year) %>%
tally() %>%
spread(Year, n, fill = 0) %>%
ungroup() %>%
mutate(Sum = rowSums(.[-(1:2)])) %>%
filter(Sum > 2) %>%
unite(Name, c(Group1, Group2), sep = " ")
Source: local data frame [5 x 5]
Name 2010 2011 2012 Sum
(chr) (dbl) (dbl) (dbl) (dbl)
1 A a 2 1 0 3
2 A b 1 1 1 3
3 B b 2 0 2 4
4 B c 1 2 0 3
5 C a 1 2 0 3