本质上,我想知道如何根据一列将数据框分类到类别中,然后在这些类别中根据另一列再次对数据框进行排序。假设我们有以下数据框 df:
ID date value
current2 01/2018 1
current2 03/2018 2
past1 03/2012 4
past1 01/2012 3
current2 09/2018 7
past2 11/2012 1
current1 01/2018 2
current1 03/2018 8
current1 05/2018 13
current2 07/2018 2
past2 09/2012 5
current1 07/2018 1
current2 05/2018 2
past1 05/2012 4
past2 07/2012 3
current2 11/2018 7
past2 05/2012 1
current1 09/2018 2
current1 11/2018 8
past1 07/2012 13
past1 09/2012 2
past1 11/2012 5
past2 03/2012 2
past2 01/2012 5
我想按 ID 的数字部分按升序对数据进行排序(“过去”版本在“当前”之前,然后在这些子类别中,我想按月升序排序. 我想要的输出如下:
ID date value
past1 01/2012 3
past1 03/2012 4
past1 05/2012 4
past1 07/2012 13
past1 09/2012 2
past1 11/2012 5
current1 01/2018 2
current1 03/2018 8
current1 05/2018 13
current1 07/2018 1
current1 09/2018 2
current1 11/2018 8
past2 01/2012 5
past2 03/2012 2
past2 05/2012 1
past2 07/2012 3
past2 09/2012 5
past2 11/2012 1
current2 01/2018 1
current2 03/2018 2
current2 05/2018 2
current2 07/2018 2
current2 09/2018 7
current2 11/2018 7
我尝试了许多不同的解决方案,但我似乎无法弄清楚如何按这样的两列进行排序的基本思想。非常感谢您的帮助
答案 0 :(得分:1)
你可以试试这个:
library(stringr)
library(dplyr)
data <- data %>%
mutate(index = str_sub(ID, -1)) %>%
dplyr::arrange(index, desc(ID), date) %>%
select(-index)
编辑:str_sub
来自包 stringr
答案 1 :(得分:0)
base R
解决方案可能是
df[with(df, order(regmatches(ID, regexpr("\\d+$", ID)), -rank(ID), date)),]
哪个返回
ID date value
1 past1 01/2012 3
2 past1 03/2012 4
3 past1 05/2012 4
4 past1 07/2012 13
5 past1 09/2012 2
6 past1 11/2012 5
7 current1 01/2018 2
8 current1 03/2018 8
9 current1 05/2018 13
10 current1 07/2018 1
11 current1 09/2018 2
12 current1 11/2018 8
13 past2 01/2012 5
14 past2 03/2012 2
15 past2 05/2012 1
16 past2 07/2012 3
17 past2 09/2012 5
18 past2 11/2012 1
19 current2 01/2018 1
20 current2 03/2018 2
21 current2 05/2018 2
22 current2 07/2018 2
23 current2 09/2018 7
24 current2 11/2018 7