我使用PyCharm Pro版本连接到AWS Athena。 它连接成功,但每当我运行查询时,我得到:
请求的fetchSize超过Athena中允许的值。 请减少fetchSize并重试。请参阅雅典娜 有效fetchSize值的文档。
我已经从AWS Athena JDBC documentation
下载了Athena JDBC驱动程序可能是什么问题?
答案 0 :(得分:1)
关于获取大小,JDBC和AWS athena需要考虑的一件事。似乎有一个semi-documented but well known limit of 1000 rows per fetch。我知道受欢迎的PyAthenaJDBC library将其设为default fetch size。所以,这可能是你问题的一部分。
当我尝试一次获取超过1000行时,我可以产生提取大小错误。
from pyathenajdbc import connect
conn = connect(s3_staging_dir='s3://SOMEBUCKET/',
region_name='us-east-1')
cur = conn.cursor()
cur.execute('SELECT * FROM SOMEDATABASE.big_table LIMIT 5000')
results = cur.fetchall()
print len(results)
# Note: The cursor class actually has a setter method to
# keep users from setting illegal fetch sizes
cur._arraysize = 1001 # Set array size one greater than the default
cur.execute('SELECT * FROM athena_test.big_table LIMIT 5000')
results = cur.fetchall() # Generate an error
java.sql.SQLExceptionPyRaisable: java.sql.SQLException: The requested fetchSize is more than the allowed value in Athena. Please reduce the fetchSize and try again. Refer to the Athena documentation for valid fetchSize values.
潜在的解决方案包括:
对于我的许多Python脚本,我使用类似于以下的工作流程。
import boto3
import time
sql = 'SELECT * from athena_test.big_table'
database = 'SOMEDATABASE'
bucket_name = 'SOMEBUCKET'
output_path = '/home/zerodf/temp/somedata.csv'
client = boto3.client('athena')
config = {'OutputLocation': 's3://' + bucket_name + '/',
'EncryptionConfiguration': {'EncryptionOption': 'SSE_S3'}}
execution_results = client.start_query_execution(QueryString = sql,
QueryExecutionContext =
{'Database': database},
ResultConfiguration = config)
execution_id = str(execution_results[u'QueryExecutionId'])
remote_file = execution_id + '.csv'
while True:
query_execution_results = client.get_query_execution(QueryExecutionId =
execution_id)
if query_execution_results['QueryExecution']['Status']['State'] == u'SUCCEEDED':
break
else:
time.sleep(60)
s3 = boto3.resource('s3')
s3.Bucket(bucket_name).download_file(remote_file, output_path)
显然,生产代码更复杂。
答案 1 :(得分:0)