我试图将JDBC连接到亚马逊的Athena。在R i中使用RJDBC库具有以下内容:
download.file('https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.0.jar','AthenaJDBC41-1.0.0.jar' )
jdbcDriver <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", 'AthenaJDBC41-1.0.0.jar',
identifier.quote="'")
然后使用凭据运行:
jdbcConnection <- dbConnect(jdbcDriver, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
"s3_staging_dir URL", "s3://testbucket/","
"USERNAME"," USERKEY","PASSWORD","PASSWORDKEY" )
但我继续收到此错误:
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.sql.SQLException: property s3_staging_dir must be set
我尝试在连接调用中设置s3_staging_dr但它不起作用。
非常感谢任何指导。
答案 0 :(得分:3)
library(RJDBC)
URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.0.jar'
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")
con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
s3_staging_dir="s3://yourbucket",
user=Sys.getenv("ATHENA_USER"),
password=Sys.getenv("ATHENA_PASSWORD"))
dbListTables(con)
## [1] "elb_logs"
将您的访问密钥和搜索条件放在.Renviron
中(在明显命名的env vars中),重新启动R并尝试上述(使用您可以访问的一桶。
dbGetQuery(con, "SELECT * FROM sampledb.elb_logs LIMIT 10") %>%
dplyr::glimpse()
## Observations: 10
## Variables: 16
## $ timestamp <chr> "2014-09-27T00:00:25.424956Z", "2014-09-27T00:00:56.439218Z", "2014-09-27T00:01:27.441734Z", "2014-09-27T00:01:58.366715Z", "2014-09-27T00:02:29.446363Z", "2014-09-2...
## $ elbname <chr> "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo"
## $ requestip <chr> "241.230.198.83", "252.26.60.51", "250.244.20.109", "247.59.58.167", "254.64.224.54", "245.195.140.77", "245.195.140.77", "243.71.49.173", "240.139.5.14", "251.192.4...
## $ requestport <dbl> 27026, 27026, 27026, 27026, 27026, 27026, 27026, 27026, 27026, 27026
## $ backendip <chr> "251.192.40.76", "249.89.116.3", "251.111.156.171", "251.139.91.156", "251.111.156.171", "254.64.224.54", "254.64.224.54", "250.244.20.109", "247.65.176.249", "250.2...
## $ backendport <dbl> 443, 8888, 8888, 8888, 8000, 8888, 8888, 8888, 8888, 8888
## $ requestprocessingtime <dbl> 9.1e-05, 9.4e-05, 8.4e-05, 9.7e-05, 9.1e-05, 9.3e-05, 9.4e-05, 8.3e-05, 9.0e-05, 9.0e-05
## $ backendprocessingtime <dbl> 0.046598, 0.038973, 0.047054, 0.039845, 0.061461, 0.037791, 0.047035, 0.048792, 0.045724, 0.029918
## $ clientresponsetime <dbl> 4.9e-05, 4.7e-05, 4.9e-05, 4.9e-05, 4.0e-05, 7.7e-05, 7.5e-05, 7.3e-05, 4.0e-05, 6.7e-05
## $ elbresponsecode <chr> "200", "200", "200", "200", "200", "200", "200", "200", "200", "200"
## $ backendresponsecode <chr> "200", "200", "200", "200", "200", "400", "400", "200", "200", "200"
## $ receivedbytes <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ sentbytes <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
## $ requestverb <chr> "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET"
## $ url <chr> "http://www.abcxyz.com:80/jobbrowser/?format=json&state=running&user=20g578y", "http://www.abcxyz.com:80/jobbrowser/?format=json&state=running&user=20g578y", "http:/...
## $ protocol <chr> "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1"
答案 1 :(得分:1)
@ Mike.Gahan
我遇到了同样的问题。我通过确保Java JDK为8.0+并使用AthenaJDBC42-2.0.7.jar来解决了该问题。您可以通过AthenaJDBC42-2.0.7.jar
下载您可以使用以下方法检查当前的JDK版本:
java -version
注意:在线上的许多说明都说要对driverClass使用“ com.amazonaws.athena.jdbc.AthenaDriver”。我无法使其正常运行。相反,我尝试使用“ com.simba.athena.jdbc.Driver”,并且能够连接到JDBC。
下面的代码是我用来使连接器运行的代码。
library(rJava)
library(RJDBC)
library(plyr)
library(dplyr)
drv <- JDBC(driverClass="com.simba.athena.jdbc.Driver", "AthenaJDBC42_2.0.7.jar", identifier.quote="'")
#connect to Athena using the driver, S3 working directory and credentials for Athena
#replace ‘athenauser’ below with prefix you have set up for your S3 bucket
con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-west-2.amazonaws.com:443/',
s3_staging_dir="s3://xxxxx",
user='xxxxx',
password='xxxxxx')
# get a list of all tables currently in Athena
dbListTables(con)
# run a sample query
dfelb=dbGetQuery(con, "SELECT * FROM sample limit 10")
head(dfelb,2)
此外,您也可以使用odbc r库。使用Simba ODBC driver
您可能需要外星人来安装RPM文件:
sudo apt-get install alien
sudo alien -i simbaathena-1.0.5.1006-1.x86_64.rpm
注意:您必须安装iODBC 3.52.9、3.52.10、3.52.11或3.52.12或unixODBC 2.3.2、2.3.3或2.3.4。 我将IODBC与ubuntu 18.04 64位一起使用,并通过以下方式安装了它:
sudo apt-get install libxml2-dev
您可以检查是否已安装Simba Athena ODBC驱动程序:
dpkg -l | grep simbaathenaodbc
使用odbc的代码:
library(odbc)
library(tidyverse)
DBI::dbConnect(
odbc::odbc(),
driver = "/opt/simba/athenaodbc/lib/64/libathenaodbc_sb64.so",
Schema = "default",
AwsRegion = "us-west-2",
AuthenticationType = "Default Credentials",
S3OutputLocation = "s3://xxxx"
) -> con
(employee <- tbl(con, sql("SELECT * FROM test”)))