我有一个很大的data.frame
个债券数据,就像那样:
ISIN CF DATE
A 105.750 2016-09-30
B 104.875 2016-05-31
C 106.875 2017-02-13
D 103.875 2016-10-07
E 5.000 2016-04-21
E 5.000 2017-04-21
E 5.000 2018-04-21
E 5.000 2019-04-21
E 105.000 2020-04-21
F 7.800 2016-09-09
F 7.800 2017-09-09
F 7.800 2018-09-09
F 7.800 2019-09-09
F 107.800 2020-09-09
我希望按ISIN
代码对元素进行分组,然后在这些组中按CF
按递增顺序对DATE
元素进行排序(已在上面的示例中完成)。然后我想对这些群组进行排序(A
,B
,C
,D
,E
,F
,以便具有最早日期的组首先出现,然后是具有第二个最早日期的组,依此类推。
我希望它看起来像这样:
ISIN CF DATE
E 5.000 2016-04-21
E 5.000 2017-04-21
E 5.000 2018-04-21
E 5.000 2019-04-21
E 105.000 2020-04-21
B 104.875 2016-05-31
F 7.800 2016-09-09
F 7.800 2017-09-09
F 7.800 2018-09-09
F 7.800 2019-09-09
F 107.800 2020-09-09
A 105.750 2016-09-30
D 103.875 2016-10-07
C 106.875 2017-02-13
我尝试过这个问题:
df<-df[order(df$ISIN,df$DATE),]
但它没有做我想做的事。
答案 0 :(得分:3)
这可以完成工作 - 基本上,首先按最小日期创建每个ISIN
的等级,然后按该等级排序:
library(data.table)
setDT(DF)
DF[DF[ , min(DATE), by = ISIN
][ , .(ISIN, rank = frank(V1))
], on = "ISIN"
][order(rank, DATE)]
# ISIN CF DATE rank
# 1: E 5.000 2016-04-21 1
# 2: E 5.000 2017-04-21 1
# 3: E 5.000 2018-04-21 1
# 4: E 5.000 2019-04-21 1
# 5: E 105.000 2020-04-21 1
# 6: B 104.875 2016-05-31 2
# 7: F 7.800 2016-09-09 3
# 8: F 7.800 2017-09-09 3
# 9: F 7.800 2018-09-09 3
# 10: F 7.800 2019-09-09 3
# 11: F 107.800 2020-09-09 3
# 12: A 105.750 2016-09-30 4
# 13: D 103.875 2016-10-07 5
# 14: C 106.875 2017-02-13 6
如果您想避免创建副本,请改为:
DF[DF[ , min(DATE), by = ISIN
][ , .(ISIN, rank = frank(V1))
], rank := rank, on = "ISIN"]
setorder(DF, rank, DATE)
如果您不想创建rank
列,请改用factor
levels
:
ord <- DF[ , min(DATE), by = ISIN][ , ISIN[frank(V1)]]
DF[ , ISIN := factor(ISIN, levels = ord)]
DF[order(ISIN, DATE)]
# ISIN CF DATE
# 1: E 5.000 2016-04-21
# 2: E 5.000 2017-04-21
# 3: E 5.000 2018-04-21
# 4: E 5.000 2019-04-21
# 5: E 105.000 2020-04-21
# 6: B 104.875 2016-05-31
# 7: F 7.800 2016-09-09
# 8: F 7.800 2017-09-09
# 9: F 7.800 2018-09-09
# 10: F 7.800 2019-09-09
# 11: F 107.800 2020-09-09
# 12: A 105.750 2016-09-30
# 13: D 103.875 2016-10-07
# 14: C 106.875 2017-02-13
您也可以在base
中执行此操作,但速度会慢一些:
ord <- names(sort(by(DF, DF$ISIN, function(x) min(x$DATE))))
DF$ISIN <- factor(DF$ISIN, levels = ord)
DF[with(DF, order(ISIN, DATE)),]
答案 1 :(得分:2)
使用dplyr,您可以执行以下操作:
library(dplyr)
df %>% group_by(ISIN) %>%
mutate(minDate = paste0(min(DATE), ISIN)) %>%
arrange(DATE) %>% ungroup() %>% arrange(minDate) %>%
select(-minDate)
请注意,临时minDate列还包含ISIN,以便您可以解决具有相同值的两个最小日期的情况。改变mutate(minDate = paste0(min(DATE),ISIN)) - &gt; mutate(minDate = min(DATE))来摆脱这个