在pyathenajdbc.connect()中传递AWS会话令牌时,无法从Python 2.7查询AWS Athena

时间:2017-04-11 19:46:45

标签: python-2.7 jdbc credentials amazon-athena multi-factor

我尝试使用pyathenajdbc.connect()连接到Athena。我通过多因素身份验证设置了AWS凭据。当我没有在连接字符串中包含AWS令牌时,我会收到以下错误。

athena_conn = connect(access_key=AWS_KEY_ID, secret_key=AWS_SECRET, s3_staging_dir='s3://abc-pqr-xyz/processed/athena-outputs/',region_name=REGION)

EROR: pyathenajdbc.error.DatabaseError: The security token included in the request is invalid. (Service: AmazonAthena; Status Code: 400; Error Code: UnrecognizedClientException; Request ID: 0d488c0b-1eed-11e7-bad8-711e54af6b73)

当我在连接字符串中包含AWS令牌时,我收到以下错误 - >

athena_conn = connect(access_key=AWS_KEY_ID, secret_key=AWS_SECRET, token=AWS_SESSION_TOKEN, s3_staging_dir='s3://abc-pqr-xyz/processed/athena-outputs/',region_name=REGION) ERROR: pyathenajdbc.error.DatabaseError: The security token included in the request is invalid. (Service: AmazonAthena; Status Code: 400; Error Code: UnrecognizedClientException; Request ID: 91751051-1eed-11e7-8347-153dfe3d84a6)

有谁知道这里有什么问题?

这是我的整个代码。

from pyathenajdbc import connect
from pyathenajdbc.util import as_pandas
from boto3 import Session
import jpype
jvm_path = jpype.getDefaultJVMPath()

_current_credentials = Session().get_credentials()
AWS_KEY_ID = _current_credentials.access_key
AWS_SECRET = _current_credentials.secret_key
AWS_SESSION_TOKEN = _current_credentials.token
REGION = "us-east-2"

#athena_conn = connect(access_key=AWS_KEY_ID, secret_key=AWS_SECRET, s3_staging_dir='s3://abc-pqr-xyz/processed/athena-outputs/',region_name=REGION)

athena_conn = connect(access_key=AWS_KEY_ID, secret_key=AWS_SECRET, token=AWS_SESSION_TOKEN, s3_staging_dir='s3://abc-pqr-xyz/processed/athena-outputs/',region_name=REGION)

cursor = athena_conn.cursor();
query = 'SELECT * FROM xyz.ABC  limit 1;'
cursor.execute(query)
df = as_pandas(cursor)
print(df)

3 个答案:

答案 0 :(得分:2)

from pyathenajdbc import connect
from pyathenajdbc.util import as_pandas
from boto3 import Session
import os

_current_credentials = Session().get_credentials()

os.environ['AWS_ACCESS_KEY_ID'] = _current_credentials.access_key
os.environ['AWS_SECRET_ACCESS_KEY'] = _current_credentials.secret_key
os.environ['AWS_SESSION_TOKEN'] = _current_credentials.token


athena_conn = connect(s3_staging_dir='s3://your-bucket/',
           region_name='us-west-2',
           aws_credentials_provider_class='com.amazonaws.athena.jdbc.shaded.com.amazonaws.auth.EnvironmentVariableCredentialsProvider')

cursor = athena_conn.cursor();
query = 'SELECT * FROM schema.table_name limit 1;'
cursor.execute(query)
df = as_pandas(cursor)
print(df)

答案 1 :(得分:2)

假设您在〜/ .aws文件夹下有一个配置文件,其中定义了区域,您可以使用Session()。region_name

以下工作正常(无需导入操作系统):

from pyathenajdbc import connect
from pyathenajdbc.util import as_pandas
from boto3 import Session
import jpype
jvm_path = jpype.getDefaultJVMPath()

_current_credentials = Session().get_credentials()
AWS_KEY_ID = _current_credentials.access_key
AWS_SECRET = _current_credentials.secret_key
REGION = Session().region_name

athena_conn = connect(access_key=AWS_KEY_ID,
               secret_key=AWS_SECRET,
               s3_staging_dir='path_to_staging_dir',
               region_name=REGION)

cursor = athena_conn.cursor();

query = 'SELECT current_date;'

cursor.execute(query)
df = as_pandas(cursor)
print(df)

答案 2 :(得分:0)

这个问题不是直截了当的,但我猜它与你的凭据有关。您应该稍微调查一下:尝试打印您的密钥并验证它们是否有效。

以下是我用来输入凭据的替代方法:

import configparser    

aws_config_file = '~/.aws/config'

Config = configparser.ConfigParser()
Config.read(os.path.expanduser(aws_config_file))

access_key_id = Config['default']['aws_access_key_id']
secret_key_id = Config['default']['aws_secret_access_key']

否则,只是为了确保问题与jdbc驱动程序无关,请粘贴以下命令的输出

import pyathenajdbc 

print(pyathenajdbc.ATHENA_CONNECTION_STRING)
print(pyathenajdbc.ATHENA_DRIVER_CLASS_NAME)
print(pyathenajdbc.ATHENA_DRIVER_DOWNLOAD_URL)
print(pyathenajdbc.ATHENA_JAR)