我正在尝试使用我的数据中不同组的样本权重来计算基尼系数。我更喜欢使用aggregate
因为我稍后使用aggregate
的输出来绘制系数。我找到了替代方法,但在这些情况下,输出并不是我所需要的。
library(reldist) #to get gini function
dat <- data.frame(country=rep(LETTERS, each=10)[1:50], replicate(3, sample(11, 10)), year=sample(c(1990:1994), 50, TRUE),wght=sample(c(1:5), 50, TRUE))
dat[51,] <- c(NA,11,2,6,1992,3) #add one more row with NA for country
gini(dat$X1) #usual gini for all
gini(dat$X1,weight=dat$wght) #gini with weight, that's what I actually need
print(a1<-aggregate( X1 ~ country+year, data=dat, FUN=gini))
#Works perfectly fine without weight.
但是,现在如何在聚合中指定权重选项?我知道还有其他方法(as shown here):
print(b1<-by(dat,list(dat$country,dat$year), function(x)with(x,gini(x$X1,x$wght)))[])
#By function works with weight but now the output has NAs in it
print(s1<-sapply(split(dat, dat$country), function(x) gini(x$X1, x$wght)))
#This seems to a good alternative but I couldn't find a way to split it by two variables
library(plyr)
print(p1<-ddply(dat,.(country,year),summarise, value=gini(X1,wght)))
#yet another alternative but now the output includes NAs for the missing country
如果有人可以告诉我在gini
中使用加权aggregate
函数的方法会非常有用,因为它会按照我需要的方式生成输出。否则,我想我会使用其中一种替代方案。
答案 0 :(得分:3)
#using aggregate
aggregate( X1 ~ country+year, data=dat, FUN=gini,weights=dat$wght) # gives different answer than the data.table and dplyr (not sure why?)
#using data.table
library(data.table)
DT<-data.table(dat)
DT[,list(mygini=gini(X1,wght)),by=.(country,year)]
#Using dplyr
library(dplyr)
dat %>%
group_by(country,year)%>%
summarise(mygini=gini(X1,wght))