我在R中运行一个收到错误的食谱,如下所示:
> left_join(ann2012full,agglevel)
Joining by: "agglvl_code"
Error in data.table::setkeyv(y, by$x) : x is not a data.table
。
两个变量是ann2012full,一个300万+ obs。 15个变量,和agglevel,56个obs。 2个变量,取自2个.csv文件。
根据其他帖子,还有其他一些针对类似问题的dplyr问题,但by
方法的框架对我来说并不清楚。是否有人能够像更新之前那样重复left_join
功能?
这两个变量有一个交叉点,该函数似乎通过在错误之前报告Joining by: "agglvl_code"
来确认:
> intersect(names(ann2012full),names(agglevel))
[1] "agglvl_code"
有问题的前几行变量......
head(ann2012full)
area_fips own_code industry_code agglvl_code size_code year qtr disclosure_code annual_avg_estabs_count annual_avg_emplvl
1: 01000 0 10 50 0 2012 A 116233 1828248
2: 01000 1 10 51 0 2012 A 1252 56031
3: 01000 1 102 52 0 2012 A 1252 56031
4: 01000 1 1021 53 0 2012 A 599 11734
5: 01000 1 1022 53 0 2012 A 2 13
6: 01000 1 1023 53 0 2012 A 17 161
total_annual_wages taxable_annual_wages annual_contributions annual_avg_wkly_wage avg_annual_pay
1: 76768801894 13424728725 419383612 808 41990
2: 4194319351 0 0 1440 74857
3: 4194319351 0 0 1440 74857
4: 719641114 0 0 1179 61330
5: 436204 0 0 662 34437
6: 12253089 0 0 1468 76343
head(agglevel)
agglvl_code agglvl_title
1 10 National, Total Covered
2 11 National, Total -- by ownership sector
3 12 National, by Domain -- by ownership sector
4 13 National, by Supersector -- by ownership sector
5 14 National, NAICS Sector -- by ownership sector
6 15 National, NAICS 3-digit -- by ownership sector
有问题的vars与str()......
相似> str(ann2012)
Classes ‘data.table’ and 'data.frame': 3556289 obs. of 15 variables:
$ area_fips : chr "01000" "01000" "01000" "01000" ...
$ own_code : int 0 1 1 1 1 1 1 1 1 1 ...
$ industry_code : chr "10" "10" "102" "1021" ...
$ agglvl_code : int 50 51 52 53 53 53 53 53 53 53 ...
$ size_code : int 0 0 0 0 0 0 0 0 0 0 ...
$ year : int 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 ...
$ qtr : chr "A" "A" "A" "A" ...
$ disclosure_code : chr "" "" "" "" ...
$ annual_avg_estabs_count: int 116233 1252 1252 599 2 17 46 32 27 4 ...
$ annual_avg_emplvl : int 1828248 56031 56031 11734 13 161 1799 6131 903 632 ...
$ total_annual_wages :Class 'integer64' num [1:3556289] 3.79e-313 2.07e-314 2.07e-314 3.56e-315 2.16e-318 ...
$ taxable_annual_wages :Class 'integer64' num [1:3556289] 6.63e-314 0.00 0.00 0.00 0.00 ...
$ annual_contributions :Class 'integer64' num [1:3556289] 2.07e-315 0.00 0.00 0.00 0.00 ...
$ annual_avg_wkly_wage : int 808 1440 1440 1179 662 1468 1581 1231 370 1716 ...
$ avg_annual_pay : int 41990 74857 74857 61330 34437 76343 82237 64031 19257 89240 ...
- attr(*, ".internal.selfref")=<externalptr>
> str(agglevel)
'data.frame': 56 obs. of 2 variables:
$ agglvl_code : int 10 11 12 13 14 15 16 17 18 21 ...
$ agglvl_title: chr "National, Total Covered" "National, Total -- by ownership sector" "National, by Domain -- by ownership sector" "National, by Supersector -- by ownership sector" ...
我为此配方加载了10个库;在所有有28个装。
> search()
[1] ".GlobalEnv" "package:tcltk" "package:microbenchmark" "package:rbenchmark" "package:choroplethr"
[6] "package:RColorBrewer" "package:maps" "package:ggplot2" "package:stringr" "package:dplyr"
[11] "package:plyr" "package:sqldf" "package:RSQLite" "package:DBI" "package:gsubfn"
[16] "package:proto" "package:data.table" "package:bit64" "package:bit" "tools:rstudio"
[21] "package:stats" "package:graphics" "package:grDevices" "package:utils" "package:datasets"
[26] "package:methods" "Autoloads" "package:base"
***********************************找到了*********的工作**********************
我深究:我使用merge
而不是left_join
,指定by
超过NULL
。那是什么......
codes <- c('agglevel','industry','ownership','size')
ann2012full <- ann2012
for(i in 1:length(codes)){
eval(parse(text=paste('ann2012full <- left_join(ann2012full, ',codes[i],')', sep='')))
}
现在是......
codes <- c('agglevel','industry','ownership','size')
ann2012full <- ann2012
for(i in 1:length(codes)){
barTitle <- intersect(names(ann2012full),names(eval(parse(text=codes[i]))))
eval(parse(text= paste('ann2012full <- merge(ann2012full, ',codes[i],',by="',barTitle,'")', sep='')))
}
但是,似乎dplyr方法中的***_join
仍然存在使用最新更新的错误。如果还有其他意见,我很乐意听到它们,因为它只适用于修改后的代码merge
。
谢谢,
答案 0 :(得分:3)
我认为你和我一样做同样的问题并遇到同样的问题。这里有两个问题(至少对我而言,当我这样做时)。
第一个问题是,如果您按照配方进行操作,则您的一组数据是数据表,另一组是日期框。所以as.data.table(code)
到第二组。
第二个问题是,在正在连接的字段中,它们在一个数据集中是整数,在另一个数据集中是字符。所以需要修复(只做as.numeric()
)
编辑:此代码是您想要的,并且在我的机器上正常工作(除非您需要将2013年更改为2012以匹配您的数据)。
codes <- c('agglevel', 'industry', 'ownership', 'size')
ann2013full <- ann2013
ann2013full$agglvl_code <- as.numeric(ann2013full$agglvl_code)
ann2013full$own_code <- as.numeric(ann2013full$own_code)
ann2013full$size_code <- as.numeric(ann2013full$size_code)
for(code in codes){
eval(parse(text = paste('ann2013full <- left_join(ann2013full,as.data.table(', code,'))', sep = '')))
}