使用group_by或filter过滤数据框中的值?

时间:2018-03-21 07:52:36

标签: r filter group-by dplyr

您好我有以下示例, df

#First Name Second Name Subject Score
#Harry  Kane    Biology 0
#Harry  Kane    Physics 25
#Harry  Kane    Math    19
#Harry  Kane    Social  16
#Harry  Kane    History 19

#Tom    Hault   Biology 0
#Tom    Hault   Physics 22
#Tom    Hault   Math    24
#Tom    Hault   Social  25
#Tom    Hault   History 20

#Ben    Capario Biology 0
#Ben    Capario Physics 12
#Ben    Capario Math    15
#Ben    Capario Social  16
#Ben    Capario History 18

#Phil   Adams   Biology 20
#Phil   Adams   Physics 22
#Phil   Adams   Math    17
#Phil   Adams   Social  15
#Phil   Adams   History 18

#Shawn  Salzensky   Biology 25
#Shawn  Salzensky   Physics 22
#Shawn  Salzensky   Math    18
#Shawn  Salzensky   Social  19
#Shawn  Salzensky   History 12

每个人都有自己的名字和个人科目的分数。

我正在尝试以这种格式输出

df1

#First Name Second Name Subject Score
#Harry  Kane    Biology 0
#Harry  Kane    Physics 25
#Tom    Hault   Biology 0
#Tom    Hault   Physics 22
#Ben    Capario Biology 0
#Ben    Capario Physics 12

我试过这个:

 df1 <- filter(df, {Subject=='Biology'&`Score`== 0} | {Subject=='Physics'&`Score`!= 0})

然而,“主题”和“得分”分别返回包含生物学及其各自得分的元素,以及得分== 0。

还有其他方式吗?

1 个答案:

答案 0 :(得分:1)

如果您想要的是Biology Score 0的所有情况(行)以及Physics的所有情况(行) Score不是0,代码的形式有效。但是,格式化建议:使用()括起逻辑表达式。在dplyr调用中,除非变量名称中有空格,否则不要引用变量名称。不要替代引号和背景抽搐。

df1 <- filter(df, (Subject == 'Biology' & Score == 0) | (Subject == 'Physics' & Score != 0))
df1
#   First.Name Second.Name Subject Score
# 1      Harry        Kane Biology     0
# 2      Harry        Kane Physics    25
# 3        Tom       Hault Biology     0
# 4        Tom       Hault Physics    22
# 5        Ben     Capario Biology     0
# 6        Ben     Capario Physics    12
# 7       Phil       Adams Physics    22
# 8      Shawn   Salzensky Physics    22

我补充一点,数据可能看起来很奇怪,但它很整洁。每行都是对分数的观察。这就是您想要数据的方式,即使它看起来不像成绩单。

数据:

df <- data.frame("First Name" = rep(c("Harry", "Tom", "Ben", "Phil", "Shawn"), each = 5),
                 "Second Name" = rep(c("Kane", "Hault", "Capario", "Adams", "Salzensky"), each = 5),
                 Subject = rep(c("Biology", "Physics", "Math", "Social", "History"), times = 5),
                 Score = c(0, 25, 19, 16, 19, 0, 22, 24, 25, 20, 0, 12, 15, 16, 18, 20, 22, 17, 15, 18, 25, 22, 18, 19, 12),
                 stringsAsFactors = FALSE)
df
   First.Name Second.Name Subject Score
1       Harry        Kane Biology     0
2       Harry        Kane Physics    25
3       Harry        Kane    Math    19
4       Harry        Kane  Social    16
5       Harry        Kane History    19
6         Tom       Hault Biology     0
7         Tom       Hault Physics    22
8         Tom       Hault    Math    24
9         Tom       Hault  Social    25
10        Tom       Hault History    20
11        Ben     Capario Biology     0
12        Ben     Capario Physics    12
13        Ben     Capario    Math    15
14        Ben     Capario  Social    16
15        Ben     Capario History    18
16       Phil       Adams Biology    20
17       Phil       Adams Physics    22
18       Phil       Adams    Math    17
19       Phil       Adams  Social    15
20       Phil       Adams History    18
21      Shawn   Salzensky Biology    25
22      Shawn   Salzensky Physics    22
23      Shawn   Salzensky    Math    18
24      Shawn   Salzensky  Social    19
25      Shawn   Salzensky History    12