满足另一列的条件

时间:2018-02-03 07:31:28

标签: r

我已经阅读了多个问题而没有找到一些有效的代码,所以感谢您的帮助。这是对早期问题的改进,虽然我可以在Excel中执行此操作,但我正在尝试让我的R加速。

我有一些令我头痛的销售数据:

date    sales
14/11   39
14/11   3.2
14/11   13
14/11   8.3
14/11   5
14/11   5.6
14/11   79
14/11   35
14/11   24
14/11   8.1
14/11   21
14/11   40
14/11   50
14/11   82
15/11   8.3
15/11   7.2
15/11   63
15/11   31
15/11   35
15/11   2.1
15/11   31
15/11   11
15/11   3.8
15/11   29
15/11   NA

我已经展示了如何对日期进行分组并找到最后三位表演者,但我希望其余的数据可见。

我希望看到另一个列中排名最低的三个销售报告的列表示为TRUE,如果不是,则为FALSE

我试过了:

if(data$sales == group_by(data$date)%>%top_n(n=-3, wt=sales)) {
data$top <- T
} else {
dat$top <- F
}

我得到的只是:

Error in UseMethod("group_by_") : 
no applicable method for 'group_by_' applied to an object of class "factor"

这不是第一次尝试 - 我已尝试过循环,如果|否则,匹配,%in%并且真的很挣扎,但是不想在这里抛弃一堆坏代码。

任何想法都非常感激。

2 个答案:

答案 0 :(得分:0)

Hope this helps!

library(dplyr)

df %>%
  group_by(date) %>%
  arrange(date, sales) %>%
  mutate(bottom3_performer = row_number() <=3)

Output is:

    date sales bottom3_performer
1  14/11   3.2              TRUE
2  14/11   5.0              TRUE
3  14/11   5.6              TRUE
4  14/11   8.1             FALSE
5  14/11   8.3             FALSE
6  14/11  13.0             FALSE
7  14/11  21.0             FALSE
...

Sample data:

df <- structure(list(date = c("14/11", "14/11", "14/11", "14/11", "14/11", 
"14/11", "14/11", "14/11", "14/11", "14/11", "14/11", "14/11", 
"14/11", "14/11", "15/11", "15/11", "15/11", "15/11", "15/11", 
"15/11", "15/11", "15/11", "15/11", "15/11", "15/11"), sales = c(39, 
3.2, 13, 8.3, 5, 5.6, 79, 35, 24, 8.1, 21, 40, 50, 82, 8.3, 7.2, 
63, 31, 35, 2.1, 31, 11, 3.8, 29, NA)), .Names = c("date", "sales"
), class = "data.frame", row.names = c(NA, -25L))

Another set of sample data & o/p:

df <- structure(list(date = c("14/11", "14/11", "14/11", "14/11", "14/11", 
"14/11", "14/11", "14/11", "14/11", "14/11", "14/11", "14/11", 
"14/11", "14/11", "15/11", "15/11", "15/11", "15/11", "15/11", 
"15/11", "15/11", "15/11", "15/11", "15/11", "15/11"), sales = c(39, 
3.2, 13, 8.3, 5, 5.6, 79, 35, 24, 8.1, 21, 40, 50, 82, 8.3, 7.2, 
63, 31, 35, 2.1, 31, 11, 3.8, 29, NA), id = 1:25, name = c("a", 
"b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", 
"o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y")), .Names = c("date", 
"sales", "id", "name"), row.names = c(NA, -25L), class = "data.frame")

    date sales    id  name bottom3_performer
 1 14/11   3.2     2     b              TRUE
 2 14/11   5.0     5     e              TRUE
 3 14/11   5.6     6     f              TRUE
 4 14/11   8.1    10     j             FALSE
 5 14/11   8.3     4     d             FALSE
 6 14/11  13.0     3     c             FALSE
 7 14/11  21.0    11     k             FALSE
...

答案 1 :(得分:0)

This should do it:

library(dplyr)

df %>% 
  group_by(date) %>% 
  mutate(bottom3 = ifelse(rank(sales) <= 3, TRUE, FALSE))

# A tibble: 25 x 3
# Groups:   date [2]
   date  sales bottom3
   <chr> <dbl> <lgl>  
 1 15/11  2.10 T      
 2 14/11  3.20 T      
 3 15/11  3.80 T      
 4 14/11  5.00 T      
 5 14/11  5.60 T      
 6 15/11  7.20 T      
 7 14/11  8.10 F      
 8 14/11  8.30 F      
 9 15/11  8.30 F      
10 15/11 11.0  F      
# ... with 15 more rows