我想使用包含复杂调查样本设计的另一列(连续变量)的分位数来计算新列。这个想法是在数据框中创建一个新变量,指示每个观察到哪个分位数组
以下是我如何在不加入样本设计的情况下执行该想法,以便您了解我的目标。
# Load Data
data(api)
# Convert data to data.table format (mostly to increase speed of the process)
apiclus1 <- as.data.table(apiclus1)
# Create deciles variable
apiclus1[, decile:=cut(api00,
breaks=quantile(api00,
probs=seq(0, 1, by=0.1), na.rm=T),
include.lowest= TRUE, labels=1:10)]
我尝试使用svyquantile
包中的survey
,但我无法理解这个问题。此代码不会将分位数组作为输出返回,我可以将其输入到新变量中。有什么想法吗?
# Load Package
library(survey)
# create survey design
dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
# What I've tried to do
svyquantile(~api00, design = dclus1, quantiles = seq(0, 1, by=0.1), method = "linear", ties="rounded")
答案 0 :(得分:1)
library(survey)
data(api)
dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
a <- svyquantile(~api00, design = dclus1, quantiles = seq(0, 1, by=0.1), method = "linear", ties="rounded")
# use factor() and findInterval()
dclus1 <- update( dclus1 , qtile = factor( findInterval( api00 , a ) ) )
# distribution
svymean( ~ qtile , dclus1 )
# or without the one observation in group number 11
dclus1 <- update( dclus1 , qtile = factor( findInterval( api00 , a[ -length( a ) ] ) ) )
# distribution
svymean( ~ qtile , dclus1 )
# quantiles by group
b <- svyby(~api00, ~stype, design = dclus1, svyquantile, quantiles = seq(0, 0.9 , by=0.1) ,ci=T)
# copy over your data
x <- apiclus1
# stype of each record
match( x$stype , b$stype )
# create the new qtile variable
x$qtile_by_stype <- factor( diag( apply( data.frame( b )[ match( x$stype , b$stype ) , 2:11 ] , 1 , function( v , w ) findInterval( w , v ) , x$api00 ) ) )
# re-create the survey design
dclus1 <- svydesign(id=~dnum, weights=~pw, data=x, fpc=~fpc)
# confirm you have quantiles
svyby( ~ qtile_by_stype , ~ stype , dclus1 , svymean )
答案 1 :(得分:0)
上面整个代码的输出是:
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
api00 411 497.8 535.6 573.2 614.6 651.75 686.6 709.55 735.4 780.7 905
您可以更改名称以代表您的群组。 0和1表示最小值和最大值。 0.1表示十分位数,0.2表示十分位数2,等等。例如:
dt_quantile = svyquantile(~api00, design = dclus1, quantiles = seq(0, 1, by=0.1), method = "linear", ties="rounded")
dt_quantile = data.table(dt_quantile)
setnames(dt_quantile, c("min",paste0("decile",1:10)))
dt_quantile = data.table(t(dt_quantile), keep.rownames = T)
dt_quantile
# rn V1
# 1: min 411.00
# 2: decile1 497.80
# 3: decile2 535.60
# 4: decile3 573.20
# 5: decile4 614.60
# 6: decile5 651.75
# 7: decile6 686.60
# 8: decile7 709.55
# 9: decile8 735.40
# 10: decile9 780.70
# 11: decile10 905.00
我错过了你的目标吗?