Question

我正在制作十分位组合令人不安。 This is my dataset: X 行代表1个会计期间，列代表企业。

我试图在每个时期获得每个分位数值。

 Decile_X <- data.frame(matrix(nrow = 11, ncol = 56))
   for(i in 1:56){
    Decile_X[,i]<-as.numeric(quantile(X[i,], prob = seq(0, 1,length = 11), 
type = 5, na.rm=T))}

产生这个 Result of quantiles in each periods, column represents periods

通过这个结果，我试图在每个时期的X数据集中得到0％~10％，10％~20％...... 90％~100％的平均值。

Df <- data.frame(matrix(nrow = 10, ncol = 56))
 for(i in 1:nrow(TaxExpense)){
   for(j in 1:10){
     Df[j,i] <- mean(rowMeans(X[i, which(!is.na(Decile_X[i,]) & 
       X[i,]>Decile_X[j,i] & X[i,]<=Decile_X[j+1,i])], na.rm=T))

但问题是因为在Decile_X的某些时期显示0.000000000在40％~50％，50％~60％，60％~70％，所以我无法准确分割。

这个问题有解决办法吗？或者我的方法制作十分位投资组合的效率非常低？

我是R的新手，并试图详细解释。请帮帮我。

Answer 1

我希望我能正确理解你的困境。

基本上这就是我在十进制内计算算术平均值的方法。但首先，我刚刚添加了一些虚拟数据，所以如果您只想将其复制到R IDE中，它应该作为示例而不必更改它。

# Some dummy data
c1 <- c(1:100)
c2 <- c(301:400)
c3 <- c(101:200)
c4 <- c(201:300)
df <- cbind(c1, c2, c3, c4)

这里我设置的数字quant_n与有多少“分区”有关，因为没有更好的单词。

quant_n <- 10 # 10 for decile, 4 for quartile, et cetera.
# Function for computing mean within each part of the n-tile
quantile_ave <- function(x, y = quant_n){
    z <- 1 / y
    q = quantile(x, seq(0, 1, by = z))
    cuts = cut(x, q)
    values_per_quantile = split(x, cuts)
    calc_mean = sapply(values_per_quantile, mean)
    names(calc_mean) <- NULL
    calc_mean
}

#Here we put the quantile_ave to work on the dummy data in df
results <- matrix(0L, nrow = quant_n, ncol = ncol(df)) #Matrix to overwrite with results
for (i in 1:ncol(df)){
    results[, i] <- quantile_ave(df[, i])
}

希望有所帮助。

在R

1 个答案: