R连接到AWS Athena

时间:2017-03-01 18:33:24

标签: r amazon-web-services amazon-athena

我正在尝试根据我在线阅读的内容连接到AWS Athena,但我遇到了问题。

采取步骤

  • 更新Java
  • 用accesskey / secretKey替换user / pass
  • 使用user / pass传递accesskey / secretKey

有什么想法吗?

错误消息:

.jcall中的错误(drv @ jdrv,“Ljava / sql / Connection;”,“connect”,as.character(url)[1],:   java.sql.SQLException:必须提供AWS accessId / secretKey或AWS凭证提供程序

系统信息

  sysname                           release                                       version 
  "Linux"                           "4.4.0-62-generic"      "#83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 2017" 
  nodename                          machine                                        login
 "ip-***-**-**-***"                 "x86_64"                                      "unknown" 
  user                              effective_user 
 "rstudio"                          "rstudio"

代码https://www.r-bloggers.com/interacting-with-amazon-athena-from-r/

library(RJDBC)

URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.0.jar'
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)

drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")

con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
                                   s3_staging_dir="s3://mybucket",
                                   user=Sys.getenv("myuser"),
                                   password=Sys.getenv("mypassword"))

1 个答案:

答案 0 :(得分:9)

Athena JDBC驱动程序期望您的AWS访问密钥ID为user,密钥为password

accessKeyId <- "your access key id..."
secretKey <- "your secret key..."

jdbcConnection <- dbConnect(
  drv, 
  'jdbc:awsathena://athena.us-east-1.amazonaws.com:443',
  s3_staging_dir="s3://mybucket",
  user=accessKeyId,
  password=secretKey
)

R-bloggers文章使用Sys.getenv("ATHENA_USER")Sys.getenv("ATHENA_PASSWORD")从环境变量中获取这些内容,但这是可选的。

已更新:使用带有来自R的Athena驱动程序的凭据提供程序

@Sam是正确的,凭据提供程序是处理AWS凭据的最佳实践。我推荐DefaultCredentialsProviderChain,它涵盖了从CLI配置文件,环境变量等加载凭据的几个选项。

  1. 从(lib)下载AWS SDK for Java,特别是SDK jar,以及第三方依赖关系jar(third-party/lib)目录。
  2. 添加一些R代码,将所有jar文件添加到rJava的类路径

    # Load JAR Files
    library("rJava")
    
    .jinit()
    
    # Load AWS SDK jar
    .jaddClassPath("/path/to/aws-java-sdk-1.11.98/lib/aws-java-sdk-1.11.98.jar")
    
    # Add Third-Party JARs
    jarFilePaths <- dir("/path/to/aws-java-sdk-1.11.98/third-party/lib/", full.names=TRUE, pattern=".jar")
    for(i in 1:length(jarFilePaths)) {
        .jaddClassPath(jarFilePaths[i])
    }
    
  3. 配置Athena驱动程序以按名称

    加载凭据提供程序类
    athenaConn <- dbConnect(
      athenaDriver, 
      'jdbc:awsathena://athena.us-east-1.amazonaws.com:443',
      s3_staging_dir="s3://mybucket",
      aws_credentials_provider_class="com.amazonaws.auth.DefaultAWSCredentialsProviderChain"
      )
    
  4. 设置类路径是关键。执行dbConnect时,Athena驱动程序将尝试从JAR加载命名类,这将加载所有依赖项。如果类路径不包含SDK JAR,您将看到如下错误:

      

    .jcall中的错误(drv @ jdrv,“Ljava / sql / Connection;”,“connect”,as.character(url)[1],:     java.lang.NoClassDefFoundError:无法初始化类com.amazonaws.auth.DefaultAWSCredentialsProviderChain

    如果没有第三方JAR引用,您可能会看到如下错误:

      

    .jcall中的错误(drv @ jdrv,“Ljava / sql / Connection;”,“connect”,as.character(url)[1],:     java.lang.NoClassDefFoundError:org / apache / commons / logging / LogFactory