R:为面板数据中的每个日期创建四分位等级的列

时间:2018-09-06 14:06:08

标签: r quartile

我有以下面板数据:

idNum        date                 salePrice
1           01.2018                  1
1           02.2018                  2
2           01.2018                  3
2           02.2018                  4
...            ...                    ...

我想要一个新列,向我显示每个日期的四分位数排名,如下所示:

idNum        date                 salePrice quartilerank
1           01.2018                  1           1
1           02.2018                  2           1
2           01.2018                  3           2
2           02.2018                  4           2
...            ...                    ...

使用该功能

TER <- within(TER, quartile <- as.integer(cut(salesPrice,  quantile(salesPrice, probs=0:4/4), include.lowest=TRUE)))

仅基于所有销售价格给我四分位排名,不区分日期。

2 个答案:

答案 0 :(得分:1)

如果我很了解,您需要计算四分位数的内部数据,因此这可能会有所帮助:

# some fake data
data <- data.frame(idNum=c(1,1,2,2,3,3,4,4),
                   date=c('01.2018','02.2018','01.2018','02.2018','01.2018','02.2018','01.2018','02.2018'),
                   salePrice=c(1,2,3,4,5,6,7,8))   

data
  idNum    date salePrice
1     1 01.2018         1
2     1 02.2018         2
3     2 01.2018         3
4     2 02.2018         4
5     3 01.2018         5
6     3 02.2018         6
7     4 01.2018         7
8     4 02.2018         8

# an empty list to populate     
qlist <- list()

# the loop that create the list with quartile for each date
for(k in data$date) {        
  subdata = subset(data, date == k)
  subdata$quartile = cut(subdata$salePrice,4,labels=F)
  qlist[[k]] <- subdata
}

# have it as a df
df <- do.call("rbind",qlist) 
df
          idNum    date salePrice quartile
01.2018.1     1 01.2018         1        1
01.2018.3     2 01.2018         3        2
01.2018.5     3 01.2018         5        3
01.2018.7     4 01.2018         7        4
02.2018.2     1 02.2018         2        1
02.2018.4     2 02.2018         4        2
02.2018.6     3 02.2018         6        3
02.2018.8     4 02.2018         8        4

答案 1 :(得分:1)

使用data.tablefindInterval的替代方法

library(data.table)
setDT(df)[ ,.(idNum,salePrice,
               quartilerank=findInterval(salePrice,quantile(salePrice),all.inside = TRUE)),
              by=date]

#returns
     date idNum salePrice quartilerank
1: 1.2018     1         1            1
2: 1.2018     2         3            4
3: 2.2018     1         2            1
4: 2.2018     2         4            4