Question

我有一个如下所示的数据集：

id  profile_id   company   product    price   
1      1           A        book      10.42
2      1           A        shirt     23.91
3      1           A        cup        5.95
4      2           B        book       7.99
5      2           B        shirt      5.95 
6      2           B        cup       11.76

我想创建一个新列“rank”，它显示每个产品，每个公司和每个profile_id的价格等级。

输出如下：

id  profile_id   company   product    price    rank 
1      1           A        book      10.42     2
2      1           A        shirt     23.91     3
3      1           A        cup        5.95     1
4      2           B        book       7.99     2
5      2           B        shirt      5.95     1
6      2           B        cup       11.76     3

我觉得这应该很容易，但我不能真正让这个工作......任何帮助将不胜感激！

可重现的代码：

df2 <- data.frame(id=c(1,2,3,4,5,6),
                  profile_id = c(1, 1, 1, 2, 2,2), 
                  company = c("A","A","A","B","B","B"), 
                  product = c("book", "shirt", "cup","book", "shirt", "cup"),
                  price = c(10.42, 23.91, 5.95, 7.99, 5.95, 11.76))

Answer 1

首先group_by“per company，profile_id”变量然后应用rank（）：

library(dplyr)
df %>% group_by(company, profile_id) %>% mutate(rank = rank(price))

library(data.table)
df[,rank:=rank(price),by = .(company, profile_id)]

#     id profile_id company product price  rank
#1     1          1       A    book 10.42     2
#2     2          1       A   shirt 23.91     3
#3     3          1       A     cup  5.95     1
#4     4          2       B    book  7.99     2
#5     5          2       B   shirt  5.95     1
#6     6          2       B     cup 11.76     3

Answer 2

我们可以使用base R来执行此操作

df$rank <- with(df, ave(price, company, profile_id, FUN = rank))

根据R中的条件创建新的排名列

2 个答案: