在lapply函数内部使用weighted.mean函数,并使用data.table

时间:2018-09-13 10:01:05

标签: r data.table mean lapply summarize

我有以下数据集:

# A tibble: 450 x 546
   matchcode idstd year  country wt  region income industry sector ownership exporter c201  c202  c203a c203b c203c c203d c2041 c2042 c205a c205b1 c205b2 c205b3 c205b4 c205b5 c205b6 c205b7
   <chr+lbl> <dbl> <dbl> <chr+l> <dbl> <dbl+> <dbl+> <dbl+lb> <dbl+> <dbl+lbl> <dbl+lb> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl+> <dbl+> <dbl+> <dbl+> <dbl+> <dbl+> <dbl+>
 1 "BGD 200~  2474 2002  Bangla~ 0.7    6      1       3       1      2         1        1994  2     100   0      0    NA     2    NA     NA   NA     NA     NA     NA     NA     NA     NA    
 2 "BGD 200~  2717 2002  Bangla~ 0.9    6      1       2       1      2         2        1986  4     100   0      0    NA     2    NA     NA   NA     NA     NA     NA     NA     NA     NA    
 3 "BGD 200~  2410 2002  Bangla~  NA    6      1       3       1      2         1        1999  4     100   0      0    NA     2    NA     NA   NA     NA     NA     NA     NA     NA     NA    
 4 "BRA 200~ 14917 2003  Brazil~  NA    4      2       8       1      2         2        1984  2     100   0      0     0     2    NA     50    1     NA     NA     NA     NA     NA     NA    
 5 "BRA 200~ 14546 2003  Brazil~ 1.1    4      2       2       1      2         2        1976  2     100   0      0     0     2    NA     50    1     NA     NA     NA     NA     NA     NA    
 6 "BRA 200~ 14709 2003  Brazil~  NA    4      2       3       1      2         2        1990  2     100   0      0     0     2    NA    100   NA      1     NA     NA     NA     NA     NA    
 7 "KHM 200~ 16475 2003  Cambod~  NA    2      1      20       2      2         2        1999  2     100   0      0     0     2    NA    100   NA     NA     NA      1     NA     NA     NA    
 8 "KHM 200~ 16298 2003  Cambod~  NA    2      1       4       3      2         2        1993  4     100   0      0     0     2    NA    100    1     NA     NA     NA     NA     NA     NA    
 9 "KHM 200~ 16036 2003  Cambod~ 0.5    2      1      21       2      2         2        1997  2     100   0      0     0     2    NA    100   NA      1     NA     NA     NA     NA     NA    
10 "CHN 200~ 17862 2002  China2~ 1.2    2      2      18       2      2         2        1993  3      49   0     51    NA    NA    NA     NA   NA     NA     NA     NA     NA     NA     NA    

我正在使用以下data.table解决方案从观测数据中创建国家/地区数据:

cols = sapply(df, is.numeric) # 
cols = names(cols)[cols]
dfclevel = df[, lapply(.SD, mean, na.rm=TRUE), .SDcols = cols, by=matchcode]

尽管代码运行良好,但我的数据集包含weights的一些观察结果,我希望将其合并到我的代码中。我一直在思考如何做到这一点,但我无法弄清楚。可以编写一个函数添加到data.table解决方案中吗?像这样:

dfclevel = df[, lapply(.SD, wfunc, na.rm=TRUE), .SDcols = cols,]

wfunc <- function(x,y)  # x = df, y=weights
for (i in nrow(df$weights) {
  if (df$weights[i] == !is.na){
    df[,i] <- df[,i]*df$weights[i]
  }

或者也许我甚至想得太多了?

编辑:基于以下我尝试的评论:

dfclevel= df[, lapply(.SD, weighted.mean(x, as.vector(wt), na.rm=TRUE), na.rm=TRUE), .SDcols = cols, by=matchcode]

weighted.mean

它给了我错误:

Error in weighted.mean(x, as.vector(wt), na.rm = TRUE) : 
  object 'x' not found

如何指定x应该是df的一列?

我尝试过,但是没有用:

dfclevel= df[, lapply(.SD, lapply(weighted.mean(x, as.vector(wt), na.rm=TRUE)), na.rm=TRUE), .SDcols = cols, by=matchcode] 
Error in match.fun(FUN) : argument "FUN" is missing, with no default

0 个答案:

没有答案