使用R,left_join方法不接受数据类型

时间:2014-12-04 17:11:33

标签: r data.table dplyr

我在R中运行一个收到错误的食谱,如下所示:

> left_join(ann2012full,agglevel) Joining by: "agglvl_code" Error in data.table::setkeyv(y, by$x) : x is not a data.table

两个变量是ann2012full,一个300万+ obs。 15个变量,和agglevel,56个obs。 2个变量,取自2个.csv文件。

根据其他帖子,还有其他一些针对类似问题的dplyr问题,但by方法的框架对我来说并不清楚。是否有人能够像更新之前那样重复left_join功能?

这两个变量有一个交叉点,该函数似乎通过在错误之前报告Joining by: "agglvl_code"来确认:

> intersect(names(ann2012full),names(agglevel))
[1] "agglvl_code"

有问题的前几行变量......

head(ann2012full)
   area_fips own_code industry_code agglvl_code size_code year qtr disclosure_code annual_avg_estabs_count annual_avg_emplvl
1:     01000        0            10          50         0 2012   A                                  116233           1828248
2:     01000        1            10          51         0 2012   A                                    1252             56031
3:     01000        1           102          52         0 2012   A                                    1252             56031
4:     01000        1          1021          53         0 2012   A                                     599             11734
5:     01000        1          1022          53         0 2012   A                                       2                13
6:     01000        1          1023          53         0 2012   A                                      17               161
   total_annual_wages taxable_annual_wages annual_contributions annual_avg_wkly_wage avg_annual_pay
1:        76768801894          13424728725            419383612                  808          41990
2:         4194319351                    0                    0                 1440          74857
3:         4194319351                    0                    0                 1440          74857
4:          719641114                    0                    0                 1179          61330
5:             436204                    0                    0                  662          34437
6:           12253089                    0                    0                 1468          76343

head(agglevel)
  agglvl_code                                    agglvl_title
1          10                         National, Total Covered
2          11          National, Total -- by ownership sector
3          12      National, by Domain -- by ownership sector
4          13 National, by Supersector -- by ownership sector
5          14   National, NAICS Sector -- by ownership sector
6          15  National, NAICS 3-digit -- by ownership sector

有问题的vars与str()......

相似
> str(ann2012)
Classes ‘data.table’ and 'data.frame':  3556289 obs. of  15 variables:
 $ area_fips              : chr  "01000" "01000" "01000" "01000" ...
 $ own_code               : int  0 1 1 1 1 1 1 1 1 1 ...
 $ industry_code          : chr  "10" "10" "102" "1021" ...
 $ agglvl_code            : int  50 51 52 53 53 53 53 53 53 53 ...
 $ size_code              : int  0 0 0 0 0 0 0 0 0 0 ...
 $ year                   : int  2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
 $ qtr                    : chr  "A" "A" "A" "A" ...
 $ disclosure_code        : chr  "" "" "" "" ...
 $ annual_avg_estabs_count: int  116233 1252 1252 599 2 17 46 32 27 4 ...
 $ annual_avg_emplvl      : int  1828248 56031 56031 11734 13 161 1799 6131 903 632 ...
 $ total_annual_wages     :Class 'integer64'  num [1:3556289] 3.79e-313 2.07e-314 2.07e-314 3.56e-315 2.16e-318 ...
 $ taxable_annual_wages   :Class 'integer64'  num [1:3556289] 6.63e-314 0.00 0.00 0.00 0.00 ...
 $ annual_contributions   :Class 'integer64'  num [1:3556289] 2.07e-315 0.00 0.00 0.00 0.00 ...
 $ annual_avg_wkly_wage   : int  808 1440 1440 1179 662 1468 1581 1231 370 1716 ...
 $ avg_annual_pay         : int  41990 74857 74857 61330 34437 76343 82237 64031 19257 89240 ...
 - attr(*, ".internal.selfref")=<externalptr> 
> str(agglevel)
'data.frame':   56 obs. of  2 variables:
 $ agglvl_code : int  10 11 12 13 14 15 16 17 18 21 ...
 $ agglvl_title: chr  "National, Total Covered" "National, Total -- by ownership sector" "National, by Domain -- by ownership sector" "National, by Supersector -- by ownership sector" ...

我为此配方加载了10个库;在所有有28个装。

> search()
 [1] ".GlobalEnv"             "package:tcltk"          "package:microbenchmark" "package:rbenchmark"     "package:choroplethr"   
 [6] "package:RColorBrewer"   "package:maps"           "package:ggplot2"        "package:stringr"        "package:dplyr"         
[11] "package:plyr"           "package:sqldf"          "package:RSQLite"        "package:DBI"            "package:gsubfn"        
[16] "package:proto"          "package:data.table"     "package:bit64"          "package:bit"            "tools:rstudio"         
[21] "package:stats"          "package:graphics"       "package:grDevices"      "package:utils"          "package:datasets"      
[26] "package:methods"        "Autoloads"              "package:base"  

***********************************找到了*********的工作**********************

我深究:我使用merge而不是left_join,指定by超过NULL。那是什么......

codes <- c('agglevel','industry','ownership','size')
ann2012full <- ann2012
for(i in 1:length(codes)){
  eval(parse(text=paste('ann2012full <- left_join(ann2012full, ',codes[i],')', sep='')))
}

现在是......

codes <- c('agglevel','industry','ownership','size')
ann2012full <- ann2012
for(i in 1:length(codes)){
  barTitle <- intersect(names(ann2012full),names(eval(parse(text=codes[i]))))
  eval(parse(text= paste('ann2012full <- merge(ann2012full, ',codes[i],',by="',barTitle,'")', sep='')))
}

但是,似乎dplyr方法中的***_join仍然存在使用最新更新的错误。如果还有其他意见,我很乐意听到它们,因为它只适用于修改后的代码merge

谢谢,

1 个答案:

答案 0 :(得分:3)

我认为你和我一样做同样的问题并遇到同样的问题。这里有两个问题(至少对我而言,当我这样做时)。

第一个问题是,如果您按照配方进行操作,则您的一组数据是数据表,另一组是日期框。所以as.data.table(code)到第二组。

第二个问题是,在正在连接的字段中,它们在一个数据集中是整数,在另一个数据集中是字符。所以需要修复(只做as.numeric()

编辑:此代码是您想要的,并且在我的机器上正常工作(除非您需要将2013年更改为2012以匹配您的数据)。

codes <- c('agglevel', 'industry', 'ownership', 'size')
ann2013full <- ann2013
ann2013full$agglvl_code <- as.numeric(ann2013full$agglvl_code)
ann2013full$own_code <- as.numeric(ann2013full$own_code)
ann2013full$size_code <- as.numeric(ann2013full$size_code)
for(code in codes){
  eval(parse(text = paste('ann2013full <- left_join(ann2013full,as.data.table(', code,'))', sep = '')))
}