如何按列对数据帧进行排序并获取索引?

时间:2015-08-20 16:38:06

标签: r sorting dataframe

我想按列对R中的数据帧进行排序,并将排名添加到新列中。

具体来说,我希望每天在price下方(升序)对data.frame列进行排名。然后,我想添加一个列,指示当天每小时的排名。

library(dplyr)
prices <- data.frame(time = c("2014-07-01 00:00:00 CEST","2014-07-01 01:00:00 CEST","2014-07-01 02:00:00 CEST","2014-07-01 03:00:00 CEST",
                  "2014-07-01 04:00:00 CEST","2014-07-01 05:00:00 CEST","2014-07-01 06:00:00 CEST","2014-07-01 07:00:00 CEST",
                  "2014-07-01 08:00:00 CEST","2014-07-01 09:00:00 CEST","2014-07-01 10:00:00 CEST","2014-07-01 11:00:00 CEST",
                  "2014-07-01 12:00:00 CEST","2014-07-01 13:00:00 CEST","2014-07-01 14:00:00 CEST","2014-07-01 15:00:00 CEST",
                  "2014-07-01 16:00:00 CEST","2014-07-01 17:00:00 CEST","2014-07-01 18:00:00 CEST","2014-07-01 19:00:00 CEST",
                  "2014-07-01 20:00:00 CEST","2014-07-01 21:00:00 CEST","2014-07-01 22:00:00 CEST","2014-07-01 23:00:00 CEST",
                  "2014-07-02 00:00:00 CEST","2014-07-02 01:00:00 CEST","2014-07-02 02:00:00 CEST","2014-07-02 03:00:00 CEST",
                  "2014-07-02 04:00:00 CEST","2014-07-02 05:00:00 CEST","2014-07-02 06:00:00 CEST","2014-07-02 07:00:00 CEST",
                  "2014-07-02 08:00:00 CEST","2014-07-02 09:00:00 CEST","2014-07-02 10:00:00 CEST","2014-07-02 11:00:00 CEST",
                  "2014-07-02 12:00:00 CEST","2014-07-02 13:00:00 CEST","2014-07-02 14:00:00 CEST","2014-07-02 15:00:00 CEST",
                  "2014-07-02 16:00:00 CEST","2014-07-02 17:00:00 CEST","2014-07-02 18:00:00 CEST","2014-07-02 19:00:00 CEST",
                  "2014-07-02 20:00:00 CEST","2014-07-02 21:00:00 CEST","2014-07-02 22:00:00 CEST","2014-07-02 23:00:00 CEST"),
         price = c(31.75,30.54,30.10,29.32,25.97,26.90,33.59,41.06,40.99,42.44,40.00,39.94,35.69,36.00,36.00,35.17,34.94,35.18,39.00,
                   41.92,40.09,38.87,39.38,36.00,30.26,29.29,29.37,25.15,25.81,27.97,31.63,39.91,39.99,39.61,39.13,40.43,38.41,36.96,
                   36.00,34.95,33.82,36.08,38.59,39.91,39.02,36.90,38.88,32.59))

我使用arange中的dplyr进行排序,如下所示。

prices_sorted <- arrange(df, format(df$time, format="%Y-%m-%d"), real)

有没有&#39;清洁&#39;到达以下的方式?

prices_ranked
                   time price ranking
1  2014-07-01 00:00:00 CEST 31.75       5
2  2014-07-01 01:00:00 CEST 30.54       6
3  2014-07-01 02:00:00 CEST 30.10       4
4  2014-07-01 03:00:00 CEST 29.32       3
5  2014-07-01 04:00:00 CEST 25.97       2
6  2014-07-01 05:00:00 CEST 26.90       1
7  2014-07-01 06:00:00 CEST 33.59       7
8  2014-07-01 07:00:00 CEST 41.06      17
9  2014-07-01 08:00:00 CEST 40.99      16
10 2014-07-01 09:00:00 CEST 42.44      18
11 2014-07-01 10:00:00 CEST 40.00      13
12 2014-07-01 11:00:00 CEST 39.94      14
13 2014-07-01 12:00:00 CEST 35.69      15
14 2014-07-01 13:00:00 CEST 36.00      24
15 2014-07-01 14:00:00 CEST 36.00      22
16 2014-07-01 15:00:00 CEST 35.17      19
17 2014-07-01 16:00:00 CEST 34.94      23
18 2014-07-01 17:00:00 CEST 35.18      12
19 2014-07-01 18:00:00 CEST 39.00      11
20 2014-07-01 19:00:00 CEST 41.92      21
21 2014-07-01 20:00:00 CEST 40.09       9
22 2014-07-01 21:00:00 CEST 38.87       8
23 2014-07-01 22:00:00 CEST 39.38      20
24 2014-07-01 23:00:00 CEST 36.00      10
25 2014-07-02 00:00:00 CEST 30.26       4
26 2014-07-02 01:00:00 CEST 29.29       5
27 2014-07-02 02:00:00 CEST 29.37       6
28 2014-07-02 03:00:00 CEST 25.15       2
29 2014-07-02 04:00:00 CEST 25.81       3
30 2014-07-02 05:00:00 CEST 27.97       1
31 2014-07-02 06:00:00 CEST 31.63       7
32 2014-07-02 07:00:00 CEST 39.91      24
33 2014-07-02 08:00:00 CEST 39.99      17
34 2014-07-02 09:00:00 CEST 39.61      16
35 2014-07-02 10:00:00 CEST 39.13      15
36 2014-07-02 11:00:00 CEST 40.43      18
37 2014-07-02 12:00:00 CEST 38.41      22
38 2014-07-02 13:00:00 CEST 36.96      14
39 2014-07-02 14:00:00 CEST 36.00      13
40 2014-07-02 15:00:00 CEST 34.95      19
41 2014-07-02 16:00:00 CEST 33.82      23
42 2014-07-02 17:00:00 CEST 36.08      21
43 2014-07-02 18:00:00 CEST 38.59      11
44 2014-07-02 19:00:00 CEST 39.91      10
45 2014-07-02 20:00:00 CEST 39.02       8
46 2014-07-02 21:00:00 CEST 36.90      20
47 2014-07-02 22:00:00 CEST 38.88       9
48 2014-07-02 23:00:00 CEST 32.59      12

3 个答案:

答案 0 :(得分:1)

我有点不清楚你想要什么样的订单,但这是你想要的吗?更新为按日期排名(我添加了一些其他数据以便您可以看到)

library(data.table)
prices <- data.table(time = c("2014-07-01 00:00:00 CEST", "2014-07-01 01:00:00 CEST", "2014-07-01 02:00:00 CEST","2014-07-01 03:00:00 CEST", "2014-07-01 04:00:00 CEST",
"2015-07-01 00:00:00 CEST", "2015-07-01 01:00:00 CEST", "2015-07-01 02:00:00 CEST","2015-07-01 03:00:00 CEST", "2015-07-01 04:00:00 CEST"),
         price = c(31.75, 30.54, 30.10, 29.32, 25.97,31.75, 30.12, 31.10, 39.32, 25.97))
prices <- prices[,"date" := as.Date(time)]

prices.sorted <- prices[order(time),ranking := rank(price,ties.method='first'), by=date]

答案 1 :(得分:0)

也许这个:

prices %>% arrange(price) %>% mutate(ranking=min_rank(price)) %>% arrange(time)
#                      time price ranking
#1 2014-07-01 00:00:00 CEST 31.75    5
#2 2014-07-01 01:00:00 CEST 30.54    4
#3 2014-07-01 02:00:00 CEST 30.10    3
#4 2014-07-01 03:00:00 CEST 29.32    2
#5 2014-07-01 04:00:00 CEST 25.97    1

答案 2 :(得分:0)

这是我的解决方案,它使用基础R解决方案sort

 prices %>% mutate(ranking = row_number(sort(price, decreasing = T)))
                      time price ranking
1 2014-07-01 00:00:00 CEST 31.75       5
2 2014-07-01 01:00:00 CEST 30.54       4
3 2014-07-01 02:00:00 CEST 30.10       3
4 2014-07-01 03:00:00 CEST 29.32       2
5 2014-07-01 04:00:00 CEST 25.97       1