Question

我有一个数据框metric，其中包含带有多个变量的每月财务数据。对于我的问题，意义如下：

crsppofo

PERMNO monthyear BetaShr 1: 85814 199501 0.5 2: 12345 199501 1.0 3: 85814 200002 1.5 4: 56789 200002 2.0 5: 12345 200002 2.5描述了每只股票，"PERMNO"显然显示了年份和月份，而"monthyear"是我的风险度量，以升序排列。

我要完成的工作是根据"BetaShr"分配十分位数（1到10），但按"BetaShr"分组。最低的十分位等级应分配给每月“ "monthyear"”的最低10％。输出应如下所示：

BetaShr"

当然，这只是一个简化的示例，其中仅分配了三个十分位数来提供您所需输出的示例（假设199501的PERMNO monthyear BetaShr BetaDecileRank 1: 85814 199501 0.5 1 2: 12345 199501 1.0 10 3: 85814 200002 1.5 1 4: 56789 200002 2.0 5 5: 12345 200002 2.5 10范围在0.5到1.0之间，而"BetaShr"的范围在1.5到2.5之间） 200002）。你明白了。

通过研究，我得出了以下代码：

library(purrr)
library(StatMeasures)
library(dplyr)
crsppofo <- crsppofo %>%
  split(crsppofo$monthyear) %>%
  map_df(~ mutate(., BetaDecileRank = decile(crsppofo$BetaShr)))

导致错误：

Error: Column `BetaDecileRank` must be length 2524 (the group size) or one, not 896935

任何有关此问题的帮助将不胜感激。随时改进我的代码或提出完全不同的方法。如果您需要更多信息，请通过评论告知我。我也乐于改进我的问题和我在SO的存在，因为我只是这个论坛和R中的新手。

Answer 1

问题在于，在split组内部，decile应用于整个数据集列'BetaShr'，而不是应用于拆分后的数据集中的行

... %>%
    map_df(~ mutate(., BetaDecileRank = decile(crsppofo$BetaShr)))
                                               ^^^^

应该是

decile(.$BetaShr)

-完整代码

library(dplyr)
library(purrr)
library(StatMeasures)
crsppofo <- crsppofo %>%
              split(crsppofo$monthyear) %>%
              map_df(~ mutate(., BetaDecileRank = decile(.$BetaShr)))
crsppofo
#  PERMNO monthyear BetaShr BetaDecileRank
#1  85814    199501     0.5              1
#2  12345    199501     1.0             10
#3  85814    200002     1.5              1
#4  56789    200002     2.0              5
#5  12345    200002     2.5             10

请注意，我们不需要split，然后使用map进行循环。而是可以通过group_by/mutate选项

来完成

crsppofo %>% 
   group_by(monthyear) %>% 
   mutate(BetaDecileRank = decile(BetaShr))
# A tibble: 5 x 4
# Groups:   monthyear [2]
#  PERMNO monthyear BetaShr BetaDecileRank
#   <int>     <int>   <dbl>          <int>
#1  85814    199501     0.5              1
#2  12345    199501     1               10
#3  85814    200002     1.5              1
#4  56789    200002     2                5
#5  12345    200002     2.5             10

数据

crsppofo <- structure(list(PERMNO = c(85814L, 12345L, 85814L, 56789L, 12345L
), monthyear = c(199501L, 199501L, 200002L, 200002L, 200002L), 
    BetaShr = c(0.5, 1, 1.5, 2, 2.5)), class = "data.frame",
    row.names = c("1:", 
"2:", "3:", "4:", "5:"))

R：按组计算十分位数

1 个答案:

数据