我有以下面板数据:
idNum date salePrice
1 01.2018 1
1 02.2018 2
2 01.2018 3
2 02.2018 4
... ... ...
我想要一个新列,向我显示每个日期的四分位数排名,如下所示:
idNum date salePrice quartilerank
1 01.2018 1 1
1 02.2018 2 1
2 01.2018 3 2
2 02.2018 4 2
... ... ...
使用该功能
TER <- within(TER, quartile <- as.integer(cut(salesPrice, quantile(salesPrice, probs=0:4/4), include.lowest=TRUE)))
仅基于所有销售价格给我四分位排名,不区分日期。
答案 0 :(得分:1)
如果我很了解,您需要计算四分位数的内部数据,因此这可能会有所帮助:
# some fake data
data <- data.frame(idNum=c(1,1,2,2,3,3,4,4),
date=c('01.2018','02.2018','01.2018','02.2018','01.2018','02.2018','01.2018','02.2018'),
salePrice=c(1,2,3,4,5,6,7,8))
data
idNum date salePrice
1 1 01.2018 1
2 1 02.2018 2
3 2 01.2018 3
4 2 02.2018 4
5 3 01.2018 5
6 3 02.2018 6
7 4 01.2018 7
8 4 02.2018 8
# an empty list to populate
qlist <- list()
# the loop that create the list with quartile for each date
for(k in data$date) {
subdata = subset(data, date == k)
subdata$quartile = cut(subdata$salePrice,4,labels=F)
qlist[[k]] <- subdata
}
# have it as a df
df <- do.call("rbind",qlist)
df
idNum date salePrice quartile
01.2018.1 1 01.2018 1 1
01.2018.3 2 01.2018 3 2
01.2018.5 3 01.2018 5 3
01.2018.7 4 01.2018 7 4
02.2018.2 1 02.2018 2 1
02.2018.4 2 02.2018 4 2
02.2018.6 3 02.2018 6 3
02.2018.8 4 02.2018 8 4
答案 1 :(得分:1)
使用data.table
和findInterval
的替代方法
library(data.table)
setDT(df)[ ,.(idNum,salePrice,
quartilerank=findInterval(salePrice,quantile(salePrice),all.inside = TRUE)),
by=date]
#returns
date idNum salePrice quartilerank
1: 1.2018 1 1 1
2: 1.2018 2 3 4
3: 2.2018 1 2 1
4: 2.2018 2 4 4