基于滚动子集将数据排序为十进制

时间:2016-02-14 14:43:54

标签: r sorting data.table dplyr percentile

我正在尝试使用R复制Fama French 1993论文。我需要进行以下排序:

  1. 每个月,
  2. 仅计算纽约证券交易所股票的ME十分位数
  3. 将所有股票分类为2中创建的十分位数。
  4. 数据生成:

        set.seed(1234)
        n = 120
        stocks <- c("A", "B", "C", "D", "E")
        exchange <- c("NYSE", "NASDAQ", "AMEX")
        df <- as.data.frame(cbind(Month = 1:12,
                          exchangeCode = exchange[round(runif(n, 1, 3))],
                          Stock = stocks[round(runif(n, 1, 5))],
                          ME=floor(100*abs(rnorm(n)))))
    

    期望的输出:

    ME_NYSE_vals <- as.numeric(paste(df[df$Month==1 & df$exchangeCode=="NYSE","ME"]))
    
    ME_ALL_vals <- as.numeric(paste(df[df$Month==1,"ME"]))
    
    cut(x = ME_ALL_vals,
    breaks = c(-Inf,quantile(ME_NYSE_vals,probs=seq(.1,.9,.1)),+Inf),
    labels = 1:10
    )
    

    应根据ME_NSYE_vals计算休息时间。剪切应该应用于每个月的所有ME_ALL_vals。

1 个答案:

答案 0 :(得分:1)

如果目的是保留整个数据框但仅为 NYSE 值生成十分位数,则下面的代码可以执行。关键是仅为与 NYSE 值相关的条目生成十分位数,但要保持完整数据集实现某种形式的部分排序。

# Libs
Vectorize(require)(package = c("dplyr", "magrittr"),
                   character.only = TRUE)
# Transformations
df %<>%
    mutate(nTileNYSE = ifelse(exchangeCode == "NYSE", ntile(ME, 10), NA))
    arrange(nTileNYSE)

代码已应用于数据:

set.seed(1)
df <- as.data.frame(cbind(exchangeCode = c("NYSE", "NASDAQ"), 
                          Stock = c("A", "B", "C", "A"), 
                          Month = 1:12,
                          ME=rnorm(1200)))

第二种方法

根据评论中的讨论,我建议采用以下方法:

# Libs --------------------------------------------------------------------

Vectorize(require)(package = c( "tidyr", "dplyr", "magrittr", "xts", "Hmisc"),
                   char = TRUE)

# Data generation ---------------------------------------------------------

set.seed(1234)
n = 120
stocks <- c("A", "B", "C", "D", "E")
exchange <- c("NYSE", "NASDAQ", "AMEX")
df <- as.data.frame(cbind(Month = 1:12,
                          exchangeCode = exchange[round(runif(n, 1, 3))],
                          Stock = stocks[round(runif(n, 1, 5))],
                          ME = floor(100*abs(rnorm(n)))))

# Transformations ---------------------------------------------------------

# For some reason this was needed
df$ME <- as.numeric(as.character(df$ME))

# Generate cuts
dfNtiles <- df %>% 
  arrange(exchangeCode, Month, ME) %>% 
  group_by(exchangeCode, Month) %>% 
  mutate(cutsBsdOnNYSE = cut(x = ME, 
                             breaks = cut2(x = df$ME[df$exchangeCode == "NYSE"],
                                           g = 10, onlycuts = TRUE))) %>% 
  ungroup() %>% 
  group_by(cutsBsdOnNYSE) %>% 
  mutate(grpBsdOnNYSE = n())

这很简单

  • 生成反映数据子集的剪切括号。
  • 将这些括号应用于整个向量(ME)
  • 为获得的组编号,以便创建组标识符

    归结为: