我正在使用pyAthena,pandas和SqlAlchemy从AWS Athena提取数据。大部分时间都可以,但是我在台式机以及SQL Server Job Agent中随机收到407代理身份验证错误。
我在SQL Server上有一个工作代理,可以在命令行中运行python应用程序。除非我先在桌面上运行,否则此过程将失败,然后在作业代理上成功运行。
首先,我在Python中设置代理:
def proxy(nt_login='', pswd=''):
"""
This function will handle setting up the proxy
"""
logging.info('Setting up proxy')
# setup proxy
HTTP_PROXY = f'http://{nt_login}:{pswd}@proxy.global.company_name.com:8080'
environ["http_proxy"] = HTTP_PROXY
environ["https_proxy"] = HTTP_PROXY
我拔出了所有与连接AWS Athena无关的代码。
# setup proxy before athena connection
proxy(NT_LOGIN, PSWD)
# create connection strings
athena_url = f'awsathena+rest://{AWS_ACCESS_KEY_ID}:{AWS_SECRET_ACCESS_KEY}@athena.{REGION_NAME}.amazonaws.com:443/{SCHEMA_NAME}?s3_staging_dir={S3_STAGING_DIR}'
# create sqlalchemy engines
athena_engine = create_engine(athena_url)
# Put max_timestamp in a param dictionary
params = {'start_timestamp': str(df.loc[0, 'MaxDate'])
, 'end_timestamp': str(df.loc[0, 'MaxDate'] + timedelta(days=5))
}
# if end_timestamp > today, change it to today so we don't pull today
today = datetime.now().strftime('%Y-%m-%d')
if params['end_timestamp'] > today:
params['end_timestamp'] = today
# Define destination table
columns = {
'ticket_id' : NVARCHAR(100)
, 'created_time' : NVARCHAR(100)
, 'agent_emp_id' : NVARCHAR(100)
, 'workflow_utilized' : NVARCHAR(500)
}
query = str(readFile('athenaSource.sql','sql'))
df = pd.read_sql(query, athena_engine.connect().connection, params=params)
忽略作业代理,为什么我会在台式机上随机运行407代理身份验证错误?我不能正确设置代理吗?