通过将列值与R中的其他列值进行匹配来过滤行

时间:2019-02-27 21:36:02

标签: r select filter

我对R很陌生,所以这可能比预期的要容易,我可能想得太多。假设我有一个data.frame(df),我想从另一列中选择符合条件的行,但是最重要的是,我需要该条件对组是唯一的。例如:

Column1    Column2    Column3 
Name1      Some Val   Criteria1
Name1      Unwanted   Also Unwanted
Name2      Some Val2  Criteria2
Name2      Unwanted   Also Unwanted

这可能会引起混淆。但基本上,我想根据每个名称的匹配条件选择每个Some Val,所以我希望它是:

Column1    Column2    Column3
Name1      Some Val1  Criteria1
Name2      Some Val2  Criteria2

问题是,如果仅选择几个名称,就很容易做到。但是我有成千上万,这意味着要写出成千上万的名称和数千种不同的条件。

2 个答案:

答案 0 :(得分:0)

您可以使用dplyr

library(dplyr)
df %>%
    group_by(Column1) %>%
    filter(str_detect(Column2, "Some Val"))
## A tibble: 2 x 3
## Groups:   Column1 [2]
#  Column1 Column2   Column3
#  <fct>   <fct>     <fct>
#1 Name1   Some Val  Criteria1
#2 Name2   Some Val2 Criteria2

样本数据

df <- read.table(text =
    "Column1    Column2    Column3
Name1      'Some Val'   Criteria1
Name1      Unwanted   'Also Unwanted'
Name2      'Some Val2'  Criteria2
Name2      Unwanted   'Also Unwanted'", header = T)

答案 1 :(得分:0)

如果要基于特定于组的条件从组中选择行,则需要某种对象来指定每个组的条件。您可以使用data.frame(下面的代码中的criteria_by_group)来完成此操作。

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tibble)

df <- tribble(
  ~group_col, ~value_col, ~criteria_col,
  "Name1", "Some Val", "Criteria1",
  "Name1", "Unwanted", "Not Criteria1",
  "Name2", "Some Val2", "Criteria2", 
  "Name2", "Unwanted", "Not Criteria2"
)

criteria_by_group <- tribble(
  ~group_col, ~group_criteria,
  "Name1", "Criteria1",
  "Name2", "Criteria2"
)

df <- left_join(df, criteria_by_group, by = "group_col")

df
#> # A tibble: 4 x 4
#>   group_col value_col criteria_col  group_criteria
#>   <chr>     <chr>     <chr>         <chr>         
#> 1 Name1     Some Val  Criteria1     Criteria1     
#> 2 Name1     Unwanted  Not Criteria1 Criteria1     
#> 3 Name2     Some Val2 Criteria2     Criteria2     
#> 4 Name2     Unwanted  Not Criteria2 Criteria2

df %>%
  group_by(group_col) %>%
  filter(criteria_col == group_criteria[1])
#> # A tibble: 2 x 4
#> # Groups:   group_col [2]
#>   group_col value_col criteria_col group_criteria
#>   <chr>     <chr>     <chr>        <chr>         
#> 1 Name1     Some Val  Criteria1    Criteria1     
#> 2 Name2     Some Val2 Criteria2    Criteria2

reprex package(v0.2.1)于2019-02-27创建