根据list-R中的条件删除数据帧列

时间:2016-04-05 18:07:33

标签: r

我有一个列表(list1),它由数据帧(df1,df2 ..,dfn)组成。每个数据包括因子(f1,f2,..)和数字(n1,n2,...)变量。例如,让:

list1[[1]]:

df1:

f1   f2   f3   n1   n2
---  ---  ---  ---  ---
a    c    x    12   5
a    c    x    5    65
a    c    y    21   90
b    a    x    45   6
b    a    x    33   11
a    a    y    5    39
a    a    y    73   22

list1[[2]]:

df2:

f4   f5   n1   n2   n3
---  ---  ---  ---  ---
d    c    12   5    41
d    b    5    65   14
d    c    21   90   51
a    a    45   6    85
d    a    33   11   7
a    a    5    39   1
a    a    73   22   16

所需的输出是list2

list2[[1]]:

df1:

f2   f3   n1   n2
---  ---  ---  ---
c    x    12   5
c    x    5    65
c    y    21   90
a    x    45   6
a    x    33   11
a    y    5    39
a    y    73   22

list2[[2]]:

df2:

f4   n1   n2   n3
---  ---  ---  ---
d    12   5    41
d    5    65   14
d    21   90   51
a    45   6    85
d    33   11   7
a    5    39   1
a    73   22   16

即,如果因子列的观察数少于3,则该列将被删除。对于上面的例子

  • list1$df1$f1有2个“b”观察值小于3.因此f1将在输出中删除
  • list1$df2$f5有1“b”和2“c”观察值小于3.因此f5将在输出中删除。

我怎么能用R做到这一点?我会非常乐于助人。非常感谢。

1 个答案:

答案 0 :(得分:2)

这是你想要的吗?

lapply(list1, function(df) df[, sapply(df, function(x) is.numeric(x) | (is.factor(x) && min(table(x))>=3))])

它适用于您的列表,此函数仅返回所有级别至少为3的数字列或因子:

df1[, sapply(df1, function(x) is.numeric(x) | (is.factor(x) && min(table(x))>=3))]

<小时/> 要重新创建list1,这里有dput

  list1 <- 
  list(structure(list(f1 = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 
  1L), .Label = c("a", "b"), class = "factor"), f2 = structure(c(2L, 
  2L, 2L, 1L, 1L, 1L, 1L), .Label = c("a", "c"), class = "factor"), 
      f3 = structure(c(1L, 1L, 2L, 1L, 1L, 2L, 2L), .Label = c("x", 
      "y"), class = "factor"), n1 = c(12L, 5L, 21L, 45L, 33L, 5L, 
      73L), n2 = c(5L, 65L, 90L, 6L, 11L, 39L, 22L)), .Names = c("f1", 
  "f2", "f3", "n1", "n2"), class = "data.frame", row.names = c(NA, 
  -7L)), structure(list(f4 = structure(c(2L, 2L, 2L, 1L, 2L, 1L, 
  1L), .Label = c("a", "d"), class = "factor"), f5 = structure(c(3L, 
  2L, 3L, 1L, 1L, 1L, 1L), .Label = c("a", "b", "c"), class = "factor"), 
      n1 = c(12L, 5L, 21L, 45L, 33L, 5L, 73L), n2 = c(5L, 65L, 
      90L, 6L, 11L, 39L, 22L), n3 = c(41L, 14L, 51L, 85L, 7L, 1L, 
      16L)), .Names = c("f4", "f5", "n1", "n2", "n3"), class = "data.frame", row.names = c(NA, 
  -7L)))