我有一个列表(list1),它由数据帧(df1,df2 ..,dfn)组成。每个数据包括因子(f1,f2,..)和数字(n1,n2,...)变量。例如,让:
list1[[1]]:
df1:
f1 f2 f3 n1 n2
--- --- --- --- ---
a c x 12 5
a c x 5 65
a c y 21 90
b a x 45 6
b a x 33 11
a a y 5 39
a a y 73 22
list1[[2]]:
df2:
f4 f5 n1 n2 n3
--- --- --- --- ---
d c 12 5 41
d b 5 65 14
d c 21 90 51
a a 45 6 85
d a 33 11 7
a a 5 39 1
a a 73 22 16
所需的输出是list2
list2[[1]]:
df1:
f2 f3 n1 n2
--- --- --- ---
c x 12 5
c x 5 65
c y 21 90
a x 45 6
a x 33 11
a y 5 39
a y 73 22
list2[[2]]:
df2:
f4 n1 n2 n3
--- --- --- ---
d 12 5 41
d 5 65 14
d 21 90 51
a 45 6 85
d 33 11 7
a 5 39 1
a 73 22 16
即,如果因子列的观察数少于3,则该列将被删除。对于上面的例子
list1$df1$f1
有2个“b”观察值小于3.因此f1将在输出中删除list1$df2$f5
有1“b”和2“c”观察值小于3.因此f5将在输出中删除。我怎么能用R做到这一点?我会非常乐于助人。非常感谢。
答案 0 :(得分:2)
这是你想要的吗?
lapply(list1, function(df) df[, sapply(df, function(x) is.numeric(x) | (is.factor(x) && min(table(x))>=3))])
它适用于您的列表,此函数仅返回所有级别至少为3的数字列或因子:
df1[, sapply(df1, function(x) is.numeric(x) | (is.factor(x) && min(table(x))>=3))]
<小时/> 要重新创建
list1
,这里有dput
:
list1 <-
list(structure(list(f1 = structure(c(1L, 1L, 1L, 2L, 2L, 1L,
1L), .Label = c("a", "b"), class = "factor"), f2 = structure(c(2L,
2L, 2L, 1L, 1L, 1L, 1L), .Label = c("a", "c"), class = "factor"),
f3 = structure(c(1L, 1L, 2L, 1L, 1L, 2L, 2L), .Label = c("x",
"y"), class = "factor"), n1 = c(12L, 5L, 21L, 45L, 33L, 5L,
73L), n2 = c(5L, 65L, 90L, 6L, 11L, 39L, 22L)), .Names = c("f1",
"f2", "f3", "n1", "n2"), class = "data.frame", row.names = c(NA,
-7L)), structure(list(f4 = structure(c(2L, 2L, 2L, 1L, 2L, 1L,
1L), .Label = c("a", "d"), class = "factor"), f5 = structure(c(3L,
2L, 3L, 1L, 1L, 1L, 1L), .Label = c("a", "b", "c"), class = "factor"),
n1 = c(12L, 5L, 21L, 45L, 33L, 5L, 73L), n2 = c(5L, 65L,
90L, 6L, 11L, 39L, 22L), n3 = c(41L, 14L, 51L, 85L, 7L, 1L,
16L)), .Names = c("f4", "f5", "n1", "n2", "n3"), class = "data.frame", row.names = c(NA,
-7L)))