使用dplyr删除所有列总和为零

时间:2015-12-03 07:17:00

标签: r dplyr

我目前正在处理类似这样的数据框:

Site  Spp1  Spp2  Spp3  LOC  TYPE
S01   2     4     0     A    FLOOD
S02   4     0     0     A    REG
....
S10   0     1     0     B    FLOOD
S11   1     0     0     B    REG

我试图做的是对数据帧进行子集化,以便我可以在R中运行一些指标物种分析。

以下代码的工作原理是我创建了两个数据子集,将它们合并为一个帧然后删除未使用的因子级别

A.flood <- filter(data, TYPE == "FLOOD", LOC == "A")
B.flood <- filter(data, TYPE == "FLOOD", LOC == "B")
A.B.flood <- rbind(A.flood, B.flood) %>% droplevels.data.frame(A.B.flood, except = c("A", "B"))

我还希望/需要做的是删除所有Spp列(在我的真实数据集中有~60),总和为零。有没有办法用dplyr实现这一点,如果有,是否可以将该代码传递到现有的A.B.flood数据帧代码?

谢谢!

修改

我设法删除了总和为零的所有列,只选择总和为&gt;的列。零:

A.B.flood.subset <- A.B.flood[, apply(A.B.flood[1:(ncol(A.B.flood))], 2, sum)!=0]

5 个答案:

答案 0 :(得分:5)

如果不使用任何软件包,我们可以使用rowSums的'Spp'列(使用grep对列进行分组)并加倍否定,以便sum>0的行为TRUE,其他假。使用此索引对行进行子集化。

data[!!rowSums(data[grep('Spp', names(data))]),]

或者使用dplyr/magrittr,我们select'Spp'列,获取sum每行的Reduce,双重否定并使用extract magrittr将原始数据集与派生索引进行子集化。

library(dplyr)
library(magrittr)
data %>%
    select(matches('^Spp')) %>%
    Reduce(`+`, .) %>%
    `!` %>%
    `!` %>%
     extract(data,.,)

数据

data <- structure(list(Site = c("S01", "S02", "S03", "S04"), 
Spp1 = c(2L, 
4L, 0L, 4L), Spp2 = c(4L, 0L, 0L, 0L), Spp3 = c(0L, 0L, 0L, 0L
), LOC = c("A", "A", "A", "A"), TYPE = c("FLOOD", "REG", 
"FLOOD", 
"REG")), .Names = c("Site", "Spp1", "Spp2", "Spp3", "LOC", 
"TYPE"), class = "data.frame", row.names = c(NA, -4L))

答案 1 :(得分:3)

我意识到这个问题现在已经很老了,但是我来到了另一个使用dplyr的“select”和“which”的解决方案,这对dplyr的爱好者来说似乎更清楚:

    <?php if($crumbs && is_array($crumbs)): ?>
    <!-- SLIDER START HERE-->
    <div class="container">
      <div class="innerSlider relative">
        <div class="innerSlidercontent">
          <h2>“People with thyroid problems avoid eating millet on a daily basis 
      as it is believed to contain some”</h2>
        </div>
        <div class="insideSlidePic"> <img src="<?php echo $this-
       >  getSkinUrl('images/barnyard_millet_bisi_belle_bhaat_recipe.jpg'); ?>"  
      alt=""> </div>
      </div>
      <div class="clear"></div>
      <ul class="routeMenu">
        <?php foreach($crumbs as $_crumbName=>$_crumbInfo): ?>
            <li class="<?php echo $_crumbName ?>">
            <?php if($_crumbInfo['link']): 

               if($_crumbInfo['title']=="Home"){
                $_crumbInfo['link']=Mage::getUrl();
            }


            ?>
                <a class="sc_hover" href="<?php echo $_crumbInfo['link'] ?>" 
    title="<?php echo $this->escapeHtml($_crumbInfo['title']) ?>"><?php echo 
   $this->escapeHtml($_crumbInfo['label']) ?></a>
            <?php elseif($_crumbInfo['last']): ?>
                <span><?php echo $this->escapeHtml($_crumbInfo['label']) ?>
      </span>
            <?php else: ?>
                <?php echo $this->escapeHtml($_crumbInfo['label']) ?>
            <?php endif; ?>
            <?php if(!$_crumbInfo['last']): ?>

            <?php endif; ?>
            </li>
        <?php endforeach; ?>
      </ul>
      </div>
      <div class="clear"></div>

      <!-- SLIDER END HERE-->
       <?php endif; ?>

答案 2 :(得分:2)

您应该使用tidyr::gather()转换为整洁的数据,数据框将更容易操作。

library(tidyr)
library(dplyr)
A.B.Flood %>% gather(Species, Sp.Count, -Site, -LOC, -TYPE) %>%
              group_by(Species) %>% 
              filter(Sp.Count > 0)

Voila,你整洁的数据减去了零数。

#    Site    LOC   TYPE Species Sp.Count
#  <fctr> <fctr> <fctr>   <chr>    <int>
#1    S01      A  FLOOD    Spp1        2
#2    S02      A    REG    Spp1        4
#3    S11      B    REG    Spp1        1
#4    S01      A  FLOOD    Spp2        4
#5    S10      B  FLOOD    Spp2        1

就个人而言,我保持这样。如果您希望原始格式返回非丢弃物种的零计数,只需将%>% spread(Species, Sp.Count, fill = 0)添加到管道。

#    Site    LOC   TYPE  Spp1  Spp2
#* <fctr> <fctr> <fctr> <dbl> <dbl>
#1    S01      A  FLOOD     2     4
#2    S02      A    REG     4     0
#3    S10      B  FLOOD     0     1
#4    S11      B    REG     1     0

答案 3 :(得分:1)

有一种更简单,更快捷的方法(并且更符合您的问题:使用dplyr)。

A.B.flood.subset <- A.B.flood %>% .[., colSums(. != 0) > 0]

答案 4 :(得分:1)

对于那些想要使用带有 where 关键字的 dplyr 1.0.0 的人,您可以这样做:

A.B.flood %>% 
  select(where( ~ is.numeric(.x) && sum(.x) != 0))

返回:

  Spp1 Spp2
1    2    4
2    4    0
3    0    0
4    4    0

使用@akrun 给出的相同数据:

A.B.flood <- structure(
  list(
    Site = c("S01", "S02", "S03", "S04"),
    Spp1 = c(2L,
             4L, 0L, 4L),
    Spp2 = c(4L, 0L, 0L, 0L),
    Spp3 = c(0L, 0L, 0L, 0L),
    LOC = c("A", "A", "A", "A"),
    TYPE = c("FLOOD", "REG",
             "FLOOD",
             "REG")
  ),
  .Names = c("Site", "Spp1", "Spp2", "Spp3", "LOC",
             "TYPE"), class = "data.frame", row.names = c(NA, -4L))