Question

对于一个数据框，我想获取或选择在一定范围内具有负值的所有列的名称。 This帖子非常接近，但是它遍历了行，这对我的数据不可行。另外，如果我存储该解决方案，它将成为一个列表，在这里我会首选一个向量。例如，对于以下数据集：

library(data.table)
df <- fread(
     "A   B   D   E  iso   year   
      0   1   1   NA ECU   2009   
      1   0   2   0  ECU   2009   
      0   0   -3  0  BRA   2011   
      1   0   4   0  BRA   2011   
      0   1   7   NA ECU   2008   
     -1   0   1   0  ECU   2008   
      0   0   3   2  BRA   2012   
      1   0   4   NA BRA   2012",
  header = TRUE
)

我想要所有负值在0到10之间的列（在示例中为A和D）。实现这一目标的最简单解决方案是什么？其他所有东西都等于数据表解决方案。

Answer 1

一种tidyverse可能是：

 df %>%
 gather(var, val, -c(5:6)) %>%
 group_by(var) %>%
 summarise(res = any(val[!is.na(val)] > -10 & val[!is.na(val)] < 0))

  var   res  
  <chr> <lgl>
1 A     TRUE 
2 B     FALSE
3 D     TRUE 
4 E     FALSE

仅选择数字列：

df %>%
 select_if(is.numeric) %>%
 gather(var, val) %>%
 group_by(var) %>%
 summarise(res = any(val[!is.na(val)] > -10 & val[!is.na(val)] < 0))

请注意，由于它是数字列，因此也会选择“年”列。

您也可以使用base R进行此操作：

df <- Filter(is.numeric, df)
cond <- as.logical(colSums(df > -10, na.rm = TRUE) *
                    colSums(df < -0, na.rm = TRUE))
colnames(df[, cond])

[1] "A" "D"

或写成“单线”：

df <- Filter(is.numeric, df)
colnames(df[, as.logical(colSums(df > -10, na.rm = TRUE) * colSums(df < -0, na.rm = TRUE))])

样本数据：

df <- read.table(text = 
 "A   B   D   E  iso   year   
      0   1   1   NA ECU   2009   
      1   0   2   0  ECU   2009   
      0   0   -3  0  BRA   2011   
      1   0   4   0  BRA   2011   
      0   1   7   NA ECU   2008   
     -1   0   1   0  ECU   2008   
      0   0   3   2  BRA   2012   
      1   0   4   NA BRA   2012", 
 header = TRUE,
 stringsAsFactors = FALSE)

Answer 2

另一个tidyverse变体：

df %>% 
   group_by(iso,year) %>% 
   keep(~any(.x>-10 & .x<0 & !is.na(.x))) %>% 
   names()
 "A" "D"

编辑：要处理因素，请使用mutate_if。我们可以类似地做（尽管我认为分组会更好）：

  df %>% 
   mutate_if(is.factor,as.character) %>% 
   purrr::keep(~any(.x>-10 & .x<0 & !is.na(.x))) %>% 
   names()
[1] "A" "D"

值：

df %>% 
  group_by(iso,year) %>% 
   keep(~any(.x>-10 & .x<0 & !is.na(.x)))
# A tibble: 8 x 2
      A     D
  <int> <int>
1     0     1
2     1     2
3     0    -3
4     1     4
5     0     7
6    -1     1
7     0     3
8     1     4

选择/获取所有在0到10之间为负值的列的名称

2 个答案: