Question

我在R中有以下数据框：

我想使用dplyr：

根据不同列的行总和过滤行

unqA   unqB   unqC   totA   totB    totC
 3       5      8      16    12      9
 5       3      2       8     5      4

我想要总和（所有Unq）＆lt; = 0.10 *总和（全部总和）的行

我尝试了类似的内容：

filter(df, rowsum(matches("unq")) <= 0.10*rowsum(matches("totalC")))

Or:

filter(df, rowsum(unqA, unqB..) <= 0.10*rowsum(totA, totB..))

我想只选择唯一计数总和<=总计数总和的10％的行。

但是，它没有工作或只返回没有行的数据。

任何建议。

Answer 1

除了我使用mutate之外，这个解决方案对@ SamuelReuther的答案采取了类似的方法。此外，通过我对问题的理解，样本数据中的任何情况都不会满足过滤器，因此我为过滤条件添加了TRUE的额外案例。

library(tidyverse)
df <- read_table("unqA   unqB   unqC   totA   totB    totC
3       5      8      16    12      9
5       3      2       8     5      4
1       4      3      30    45     25")

df <- df %>% 
  mutate(sum_unq = rowSums(select(., starts_with("unq"))),
         sum_tot = rowSums(select(., starts_with("tot"))))
df  
#> # A tibble: 3 x 8
#>    unqA  unqB  unqC  totA  totB  totC sum_unq sum_tot
#>   <int> <int> <int> <int> <int> <int>   <dbl>   <dbl>
#> 1     3     5     8    16    12     9      16      37
#> 2     5     3     2     8     5     4      10      17
#> 3     1     4     3    30    45    25       8     100
df %>% filter(sum_unq <= 0.1 * sum_tot)
#> # A tibble: 1 x 8
#>    unqA  unqB  unqC  totA  totB  totC sum_unq sum_tot
#>   <int> <int> <int> <int> <int> <int>   <dbl>   <dbl>
#> 1     1     4     3    30    45    25       8     100

Answer 2

好的，我尝试了一些东西，希望它适合你（如果我理解你的问题，那就不是真的）：

这是您的示例数据框：

df <- data.frame(unqA = c(3, 5),
                 unqB = c(5, 3),
                 unqC = c(8, 2),
                 totA = c(16, 8),
                 totB = c(12, 5),
                 totC = c(9, 4))

作为第一步，我将计算所需的额外列：

library(dplyr)
df_ext <- cbind(df,
  rowSums_unq = df %>%
    select(matches("unq")) %>%
    rowSums(),
  rowSums_tot = df %>%
    select(matches("tot")) %>%
    rowSums())

这给出了：

然后过滤数据框，最后删除不必要的列：

df_ext %>%
  filter(rowSums_unq <= 0.1 * rowSums_tot) %>%
  select(-rowSums_unq, -rowSums_tot)

使用条件rowSums（dplyr方法）在R dataframe中选择行

2 个答案: