我的数据集是y。我有一个ID和Sales列。我想根据他们的销售情况,在每个员工的百分位数上添加3栏。
百分位数的公式为:
Percentile Employee(i) = (Number of employees with less sales)/(Total employees-1)
由于
答案 0 :(得分:3)
使用您的公式,请考虑以下假数据解决方案:
#fake data
y <- data.frame(
#20 fake ids
id = seq(1,20),
#20 fake sales between 10000 and 15000
sales = runif(20, 10000, 15000))
#define an employee count
emp_cnt <- length(y$id)
#rank your sales
y$rank <- rank(y$sales,ties.method="min")
#subtract each rank from one (i.e. lowest rank) and divide by one minus emp_cnt
y$percentile <- (y$rank - 1)/(emp_cnt - 1)
答案 1 :(得分:0)
使用此:
within(y[order(y$sales), ], p <- with(rle(sales), rep(c(0, head(cumsum(lengths), -1)), lengths))/(length(ID)-1))
示例输出:
ID sales p
4 4 3 0.0000000
6 6 3 0.0000000
11 11 3 0.0000000
19 19 3 0.0000000
20 20 3 0.0000000
3 3 4 0.2631579
13 13 4 0.2631579
17 17 4 0.2631579
18 18 4 0.2631579
2 2 5 0.4736842
8 8 5 0.4736842
10 10 5 0.4736842
12 12 5 0.4736842
16 16 5 0.4736842
9 9 6 0.7368421
5 5 7 0.7894737
7 7 7 0.7894737
15 15 7 0.7894737
1 1 8 0.9473684
14 14 8 0.9473684
使用的数据:
ID sales
1 1 8
2 2 5
3 3 4
4 4 3
5 5 7
6 6 3
7 7 7
8 8 5
9 9 6
10 10 5
11 11 3
12 12 5
13 13 4
14 14 8
15 15 7
16 16 5
17 17 4
18 18 4
19 19 3
20 20 3