从AWS Athena表查询数据后R会话异常终止

时间:2019-11-10 16:47:02

标签: r dbi amazon-athena

我正在尝试从AWS Athena elb_logs中的演示表中查询数据。

但是,如果尝试使用dbReadTabledbGetQuery R会话查询数据,则到数据库的连接成功。

library(DBI)
con <- dbConnect(
  odbc::odbc(),
  Driver             = "/Library/simba/athenaodbc/lib/libathenaodbc_sbu.dylib",
  S3OutputLocation = "s3://aws-athena-query-results-eu-central-1/",
  AwsRegion          = "eu-central-1",
  AuthenticationType = "IAM Credentials",
  UID                = Sys.getenv("AWS_ACCESS_KEY_ID"),
  PWD                = Sys.getenv("AWS_SECRET_ACCESS_KEY")
)

dbListObjects(con)

dbReadTable(con, name = "elb_logs")
dbGetQuery(con, "SELECT * FROM sampledb.elb_logs")

reprex package(v0.3.0)于2019-11-10创建

devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       macOS Mojave 10.14.6        
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Berlin               
#>  date     2019-11-10                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)
#>  backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.0)
#>  bit           1.1-14  2018-05-29 [1] CRAN (R 3.6.0)
#>  bit64         0.9-7   2017-05-08 [1] CRAN (R 3.6.0)
#>  blob          1.2.0   2019-07-09 [1] CRAN (R 3.6.0)
#>  callr         3.3.2   2019-09-22 [1] CRAN (R 3.6.0)
#>  cli           1.1.0   2019-03-19 [1] CRAN (R 3.6.0)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.0)
#>  DBI         * 1.0.0   2018-05-02 [1] CRAN (R 3.6.0)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.0)
#>  devtools      2.2.1   2019-09-24 [1] CRAN (R 3.6.0)
#>  digest        0.6.22  2019-10-21 [1] CRAN (R 3.6.1)
#>  ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.0)
#>  fs            1.3.1   2019-05-06 [1] CRAN (R 3.6.0)
#>  glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.0)
#>  highr         0.8     2019-03-20 [1] CRAN (R 3.6.0)
#>  hms           0.5.2   2019-10-30 [1] CRAN (R 3.6.1)
#>  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.0)
#>  knitr         1.25    2019-09-18 [1] CRAN (R 3.6.0)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.0)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.0)
#>  odbc          1.1.6   2018-06-09 [1] CRAN (R 3.6.0)
#>  pillar        1.4.2   2019-06-29 [1] CRAN (R 3.6.0)
#>  pkgbuild      1.0.6   2019-10-09 [1] CRAN (R 3.6.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.0)
#>  pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.6.0)
#>  prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.6.0)
#>  processx      3.4.1   2019-07-18 [1] CRAN (R 3.6.0)
#>  ps            1.3.0   2018-12-21 [1] CRAN (R 3.6.0)
#>  R6            2.4.0   2019-02-14 [1] CRAN (R 3.6.0)
#>  Rcpp          1.0.3   2019-11-08 [1] CRAN (R 3.6.1)
#>  remotes       2.1.0   2019-06-24 [1] CRAN (R 3.6.0)
#>  rlang         0.4.1   2019-10-24 [1] CRAN (R 3.6.1)
#>  rmarkdown     1.16    2019-10-01 [1] CRAN (R 3.6.0)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)
#>  stringi       1.4.3   2019-03-12 [1] CRAN (R 3.6.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.0)
#>  testthat      2.2.1   2019-07-25 [1] CRAN (R 3.6.0)
#>  tibble        2.1.3   2019-06-06 [1] CRAN (R 3.6.0)
#>  usethis       1.5.1   2019-07-04 [1] CRAN (R 3.6.0)
#>  vctrs         0.2.0   2019-07-05 [1] CRAN (R 3.6.0)
#>  withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.0)
#>  xfun          0.10    2019-10-01 [1] CRAN (R 3.6.0)
#>  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.6.0)
#>  zeallot       0.1.0   2018-01-28 [1] CRAN (R 3.6.0)
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

这是我直接在R中运行时遇到的错误

 *** caught segfault ***
address 0x21, cause 'memory not mapped'

Traceback:
 1: new_result(connection@ptr, statement)
 2: OdbcResult(connection = conn, statement = statement)
 3: dbSendQuery(conn, statement, ...)
 4: dbSendQuery(conn, statement, ...)
 5: .local(conn, statement, ...)
 6: dbGetQuery(con, "SELECT elb_name FROM sampledb.elb_logs LIMIT 10")
 7: dbGetQuery(con, "SELECT elb_name FROM sampledb.elb_logs LIMIT 10")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

更新 我尝试了另一种方法,但收到了另一条错误消息,该消息似乎与Athena设置更相关。如果有人已经连接到该数据库,我将不胜感激。

library(rJava)
library(RJDBC)
#> Loading required package: DBI
URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/JDBC/SimbaAthenaJDBC_2.0.9/AthenaJDBC42_2.0.9.jar'
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil, mode="wb")
drv <- JDBC(driverClass="com.simba.athena.jdbc.Driver", classPath = fil, identifier.quote="'")
con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-west-2.amazonaws.com:443/',
                                   S3OutputLocation="s3://aws-bmw-athena-results/",
                                   user=Sys.getenv("AWS_ACCESS_KEY_ID"),
                                   password=Sys.getenv("AWS_SECRET_ACCESS_KEY"),
                                   )

dfdhis2=dbGetQuery(con, "SELECT elb_name FROM sampledb.elb_logs LIMIT 10")
#> Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve JDBC result set for SELECT elb_name FROM sampledb.elb_logs LIMIT 10 ([Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. No output location provided. An output location is required either through the Workgroup result configuration setting or as an API input. [Execution ID not available])

reprex package(v0.3.0)于2019-11-11创建

2 个答案:

答案 0 :(得分:0)

好吧,我将关闭该答案,因为我发现会话被R环境问题中止了。

答案 1 :(得分:0)

替代连接方法

如果要尝试从R连接到AWS Athena的另一种方法,可以尝试以下两个新程序包:noctuaRAthena。他们分别使用AWS开发工具包pawsboto3连接到Athena。

连接到雅典娜

RAthena(boto3连接方法)

在环境变量中设置“ AWS_ACCESS_KEY_ID”和“ AWS_SECRET_ACCESS_KEY”后,RAthena会为您选择它们​​。

library(DBI)

con <- dbConnect(
  RAthena::athena(),
  s3_staging_dir = "s3://aws-athena-query-results-eu-central-1/",
  region_name    = "eu-central-1")

现在它将使用类似于odbc和RJDBC的DBI接口查询AWS Athena。

dbListObjects(con)
dbGetQuery(con, "SELECT * FROM sampledb.elb_logs")

noctua(爪子连接方法)

RAthena类似,noctua将提取AWS环境变量。

library(DBI)

con <- dbConnect(
  noctua::athena(),
  s3_staging_dir = "s3://aws-athena-query-results-eu-central-1/",
  region_name    = "eu-central-1")

现在它将使用类似于odbc和RJDBC的DBI接口查询AWS Athena。

dbListObjects(con)
dbGetQuery(con, "SELECT * FROM sampledb.elb_logs")

总结

希望从R连接到AWS Athena时,这为您提供了另一个选择。如果要检出这些软件包,请访问noctua documentationRAthena documentation

上的软件包文档。