Question

我正在尝试使用我的数据中不同组的样本权重来计算基尼系数。我更喜欢使用aggregate因为我稍后使用aggregate的输出来绘制系数。我找到了替代方法，但在这些情况下，输出并不是我所需要的。

library(reldist) #to get gini function
dat <- data.frame(country=rep(LETTERS, each=10)[1:50], replicate(3, sample(11, 10)), year=sample(c(1990:1994), 50, TRUE),wght=sample(c(1:5), 50, TRUE))
dat[51,] <- c(NA,11,2,6,1992,3) #add one more row with NA for country

gini(dat$X1) #usual gini for all
gini(dat$X1,weight=dat$wght) #gini with weight, that's what I actually need
print(a1<-aggregate( X1 ~ country+year, data=dat, FUN=gini)) 
#Works perfectly fine without weight.

但是，现在如何在聚合中指定权重选项？我知道还有其他方法（as shown here）：

print(b1<-by(dat,list(dat$country,dat$year), function(x)with(x,gini(x$X1,x$wght)))[]) 
#By function works with weight but now the output has NAs in it

print(s1<-sapply(split(dat, dat$country), function(x) gini(x$X1, x$wght))) 
#This seems to a good alternative but I couldn't find a way to split it by two variables

library(plyr)
print(p1<-ddply(dat,.(country,year),summarise, value=gini(X1,wght))) 
#yet another alternative but now the output includes NAs for the missing country

如果有人可以告诉我在gini中使用加权aggregate函数的方法会非常有用，因为它会按照我需要的方式生成输出。否则，我想我会使用其中一种替代方案。

Answer 1

 #using aggregate
    aggregate( X1 ~ country+year, data=dat, FUN=gini,weights=dat$wght) # gives different answer than the data.table and dplyr (not sure why?)
 #using data.table
    library(data.table)
    DT<-data.table(dat)
    DT[,list(mygini=gini(X1,wght)),by=.(country,year)]

 #Using dplyr
    library(dplyr)
    dat %>%
    group_by(country,year)%>%
    summarise(mygini=gini(X1,wght))

如何在聚合函数中使用加权gini函数？

1 个答案: