当每行组满足至少N列的条件时,过滤df组

时间:2017-11-23 11:40:42

标签: r dplyr apply

我的df有点小麻烦。首先,我将向您展示一个例子,然后解释我想收到的内容。

我的输入df:

C1  C2  C3  C4  C5  C6  C7  C8
A   I   I   D   X   I   I   I
A   I   I   I   X   D   I   I
A   I   I   I   X   I   I   I
A   I   D   I   X   NC  I   I
B   D   D   I   X   I   I   I
B   D   I   NC  X   I   I   D
C   NC  I   I   X   NC  D   I
C   I   I   I   X   I   I   I
C   I   I   I   X   I   I   D
D   NC  NC  I   X   D   D   D
D   I   I   I   X   D   D   I
D   D   D   I   X   I   I   NC
D   I   I   I   X   NC  I   I
E   NC  I   I   X   I   I   D
E   I   I   I   X   I   D   D

期望的结果:

C1  C2  C3  C4  C5  C6  C7  C8
A   I   I   D   X   I   I   I
A   I   I   I   X   D   I   I
A   I   I   I   X   I   I   I
A   I   D   I   X   NC  I   I

我希望只有群组(group by column 'C1')(包含所有行),其中在每行中至少有2个出现'I'(让我们在A C2, C3, C4小组列中取C6, C7, C8}。

我决定使用filter()all()rowSums()

 df_filtered <- df %>%
  group_by(C1) %>%
  filter(all(rowSums(df[,2:4] == 'I' & df[,6:8] == 'I') >= 2))

什么不起作用?它返回0行,不知道为什么......

2 个答案:

答案 0 :(得分:2)

解决方案

#menu {
  background-color: #8a6d3b;
  background-image: linear-gradient(to bottom, #bba784, #8a6d3b);
  background-repeat: repeat-x;
  border-color: #c7b595 #8a6d3b #8e6318;
  min-height: 40px;
}

#menu .nav > li.active > a {
  background-color: #e0c698;
}

结果

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<nav id="menu" class="navbar">
  <div class="navbar-header"><span id="category" class="visible-xs">Categories</span>
    <button type="button" class="btn btn-navbar navbar-toggle" data-toggle="collapse" data-target=".navbar-ex1-collapse"><i class="fa fa-bars"></i></button>
  </div>
  <div class="collapse navbar-collapse navbar-ex1-collapse">
    <ul class="nav navbar-nav">
      <li class="dropdown"><a href="http://mysite/component" class="dropdown-toggle" data-toggle="dropdown">Components</a>
        <div class="dropdown-menu">
          <div class="dropdown-inner">
            <ul class="list-unstyled">
              <li><a href="http://mysite/mouse">Mice</a></li>
              <li><a href="http://mysite/monitor">Monitors</a></li>
              <li><a href="http://mysite/printer">Printers</a></li>
            </ul>
          </div>
          <a href="http://mysite/component" class="see-all">Show All Components</a> </div>
      </li>

      <li><a href="http://mysite/tablet">Tablets</a></li>
      <li><a href="http://mysite/software">Software</a></li>
      <li><a href="http://mysite/smartphone">Phones</a></li>
      <li><a href="http://mysite/camera">Cameras</a></li>
    </ul>
  </div>
</nav>

解释

使用时

df %>%
    mutate(condition = rowSums(.[2:4] == 'I') >= 2 & rowSums(.[6:8] == 'I') >= 2) %>%
    group_by(C1) %>%
    filter(all(condition)) %>%
    select(-condition)

# A tibble: 4 x 8 # Groups: C1 [1] C1 C2 C3 C4 C5 C6 C7 C8 <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> 1 A I I D X I I I 2 A I I I X D I I 3 A I I I X I I I 4 A I D I X NC I I 比较是{em>所有行的filter(all(rowSums(df[,2:4] == 'I' & df[,6:8] == 'I') >= 2)) ,而不仅仅是您小组的行。all()。此方法评估每行的条件,然后仅在组上调用df

答案 1 :(得分:0)

您可以尝试unite(),然后按正则表达式过滤。这是你的例子:

library(tidyverse)

# First loading your data 
data <-read.table(text = "C1  C2  C3  C4  C5  C6  C7  C8
A   I   I   D   X   I   I   I
A   I   I   I   X   D   I   I
A   I   I   I   X   I   I   I
A   I   D   I   X   NC  I   I
B   D   D   I   X   I   I   I
B   D   I   NC  X   I   I   D
C   NC  I   I   X   NC  D   I
C   I   I   I   X   I   I   I
C   I   I   I   X   I   I   D
D   NC  NC  I   X   D   D   D
D   I   I   I   X   D   D   I
D   D   D   I   X   I   I   NC
D   I   I   I   X   NC  I   I
E   NC  I   I   X   I   I   D
E   I   I   I   X   I   D   D", header = T)

# Then filtering rows
data %>% 
  # Creating a helper column
  unite(merged, C1:C8, sep = "", remove = F) %>%
  # Filtering by regexp
  filter(grepl("^A", merged), grepl("II", merged)) %>%
  # Deleting helper column
  select(-merged)

结果如下:

  C1 C2 C3 C4 C5 C6 C7 C8
1  A  I  I  D  X  I  I  I
2  A  I  I  I  X  D  I  I
3  A  I  I  I  X  I  I  I
4  A  I  D  I  X NC  I  I

玩得开心;)