我有以下数据集:
# A tibble: 450 x 546
matchcode idstd year country wt region income industry sector ownership exporter c201 c202 c203a c203b c203c c203d c2041 c2042 c205a c205b1 c205b2 c205b3 c205b4 c205b5 c205b6 c205b7
<chr+lbl> <dbl> <dbl> <chr+l> <dbl> <dbl+> <dbl+> <dbl+lb> <dbl+> <dbl+lbl> <dbl+lb> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl+> <dbl+> <dbl+> <dbl+> <dbl+> <dbl+> <dbl+>
1 "BGD 200~ 2474 2002 Bangla~ 0.7 6 1 3 1 2 1 1994 2 100 0 0 NA 2 NA NA NA NA NA NA NA NA NA
2 "BGD 200~ 2717 2002 Bangla~ 0.9 6 1 2 1 2 2 1986 4 100 0 0 NA 2 NA NA NA NA NA NA NA NA NA
3 "BGD 200~ 2410 2002 Bangla~ NA 6 1 3 1 2 1 1999 4 100 0 0 NA 2 NA NA NA NA NA NA NA NA NA
4 "BRA 200~ 14917 2003 Brazil~ NA 4 2 8 1 2 2 1984 2 100 0 0 0 2 NA 50 1 NA NA NA NA NA NA
5 "BRA 200~ 14546 2003 Brazil~ 1.1 4 2 2 1 2 2 1976 2 100 0 0 0 2 NA 50 1 NA NA NA NA NA NA
6 "BRA 200~ 14709 2003 Brazil~ NA 4 2 3 1 2 2 1990 2 100 0 0 0 2 NA 100 NA 1 NA NA NA NA NA
7 "KHM 200~ 16475 2003 Cambod~ NA 2 1 20 2 2 2 1999 2 100 0 0 0 2 NA 100 NA NA NA 1 NA NA NA
8 "KHM 200~ 16298 2003 Cambod~ NA 2 1 4 3 2 2 1993 4 100 0 0 0 2 NA 100 1 NA NA NA NA NA NA
9 "KHM 200~ 16036 2003 Cambod~ 0.5 2 1 21 2 2 2 1997 2 100 0 0 0 2 NA 100 NA 1 NA NA NA NA NA
10 "CHN 200~ 17862 2002 China2~ 1.2 2 2 18 2 2 2 1993 3 49 0 51 NA NA NA NA NA NA NA NA NA NA NA
我正在使用以下data.table解决方案从观测数据中创建国家/地区数据:
cols = sapply(df, is.numeric) #
cols = names(cols)[cols]
dfclevel = df[, lapply(.SD, mean, na.rm=TRUE), .SDcols = cols, by=matchcode]
尽管代码运行良好,但我的数据集包含weights
的一些观察结果,我希望将其合并到我的代码中。我一直在思考如何做到这一点,但我无法弄清楚。可以编写一个函数添加到data.table解决方案中吗?像这样:
dfclevel = df[, lapply(.SD, wfunc, na.rm=TRUE), .SDcols = cols,]
wfunc <- function(x,y) # x = df, y=weights
for (i in nrow(df$weights) {
if (df$weights[i] == !is.na){
df[,i] <- df[,i]*df$weights[i]
}
或者也许我甚至想得太多了?
编辑:基于以下我尝试的评论:
dfclevel= df[, lapply(.SD, weighted.mean(x, as.vector(wt), na.rm=TRUE), na.rm=TRUE), .SDcols = cols, by=matchcode]
它给了我错误:
Error in weighted.mean(x, as.vector(wt), na.rm = TRUE) :
object 'x' not found
如何指定x
应该是df
的一列?
我尝试过,但是没有用:
dfclevel= df[, lapply(.SD, lapply(weighted.mean(x, as.vector(wt), na.rm=TRUE)), na.rm=TRUE), .SDcols = cols, by=matchcode]
Error in match.fun(FUN) : argument "FUN" is missing, with no default