我正在尝试从Appnexus api请求批处理日志级数据。根据官方数据服务指南,有四个主要步骤:
1。帐户身份验证 - >在Json中返回令牌
2。获取可用的数据馈送列表并查找下载参数 - >在Json中返回参数
第3。通过传递下载参数获取请求文件下载位置代码 - >从标题中提取位置代码
4。通过传递位置代码来获取下载日志数据文件 - >返回gz数据文件
这些步骤在终端中使用curl:
完美运行curl -b cookies -c cookies -X POST -d @auth 'https://api.appnexus.com/auth'
curl -b cookies -c cookies 'https://api.appnexus.com/siphon?siphon_name=standard_feed'
curl --verbose -b cookies -c cookies 'https://api.appnexus.com/siphon-download?siphon_name=standard_feed&hour=2017_12_28_09×tamp=20171228111358&member_id=311&split_part=0'
curl -b cookies -c cookies 'http://data-api-gslb.adnxs.net/siphon-download/[location code]' > ./data_download/log_level_feed.gz
在 Python 中,我正在尝试测试api。但是,它一直给我“ ConnectionError ”。在步骤1-2 中,它仍然运行良好,以便我成功获取Json响应中的参数以构建第3步的网址,其中我需要请求位置代码并从响应的标题中提取它。
步骤1:
# Step 1
############ Authentication ###########################
# Select End-Point
auth_endpoint = 'https://api.appnexus.com/auth'
# API Key
auth_app = json.dumps({'auth':{'username':'xxxxxxx','password':'xxxxxxx'}})
# Proxy
proxy = {'https':'https://proxy.xxxxxx.net:xxxxx'}
r = requests.post(auth_endpoint, proxies=proxy, data=auth_app)
data = json.loads(r.text)
token = data['response']['token']
步骤2:
# Step 2
########### Check report list ###################################
check_list_endpoint = 'https://api.appnexus.com/siphon?siphon_name=standard_feed'
report_list = requests.get(check_list_endpoint, proxies=proxy, headers={"Authorization":token})
data = json.loads(report_list.text)
print(str(len(data['response']['siphons'])) + ' previous hours available for download')
# Build url for single report - extract para
download_endpoint = 'https://api.appnexus.com/siphon-download'
siphon_name = 'siphon_name=standard_feed'
hour = 'hour=' + data['response']['siphons'][400]['hour']
timestamp = 'timestamp=' + data['response']['siphons'][400]['timestamp']
member_id = 'member_id=311'
split_part = 'split_part=' + data['response']['siphons'][400]['splits'][0]['part']
# Build url
download_endpoint_url = download_endpoint + '?' + \
siphon_name + '&' + \
hour + '&' + \
timestamp + '&' + \
member_id + '&' + \
split_part
# Check
print(download_endpoint_url)
然而,以下第3步中的“requests.get”不是一直运行完成,而是继续提供“ ConnectionError ”警告。另外,我发现“位置代码”实际上是在“ / siphon-download / ”之后的警告信息中。因此,我使用“try..except”从警告消息中提取它并保持代码运行。
步骤3:
# Step 3
######### Extract location code for target report ####################
try:
TT = requests.get(download_endpoint_url, proxies=proxy, headers={"Authorization":token}, timeout=1)
except ConnectionError, e:
text = e.args[0].args[0]
m = re.search('/siphon-download/(.+?) ', text)
if m:
location = m.group(1)
print('Successfully Extracting location: ' + location)
原始警告消息, Step3 中没有“try..except”:
ConnectionError: HTTPConnectionPool(host='data-api-gslb.adnxs.net', port=80): Max retries exceeded with url:
/siphon-download/dbvjhadfaslkdfa346583
(Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x0000000007CBC7B8>:
Failed to establish a new connection: [Errno 10060] A connection attempt failed because the connected party did not
properly respond after a period of time, or established connection failed because connected host has failed to respond',))
然后,我试图用我从前一个警告消息中提取的位置代码发出最后一个GET请求,以便像我在终端中使用“curl”一样下载gz数据文件。但是,我收到了相同的警告消息 - ConnectionError 。
步骤4:
# Step 4
######## Download data file #######################
extraction_location = 'http://data-api-gslb.adnxs.net/siphon-download/' + location
LLD = requests.get(extraction_location, proxies=proxy, headers={"Authorization":token}, timeout=1)
Step4 :中的原始警告消息
ConnectionError: HTTPConnectionPool(host='data-api-gslb.adnxs.net', port=80): Max retries exceeded with url:
/siphon-download/dbvjhadfaslkdfa346583
(Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x0000000007BE15C0>:
Failed to establish a new connection: [Errno 10060] A connection attempt failed because the connected party did not
properly respond after a period of time, or established connection failed because connected host has failed to respond',))
为了仔细检查,我使用curl测试了终端中我的Python脚本中生成的所有端点,参数和位置代码。它们都工作正常,下载的数据是正确的。任何人都可以帮我解决Python中的这个问题,或者指出我正确的方向来发现为什么会这样?非常感谢!
答案 0 :(得分:1)
1)在卷曲中,您正在阅读和编写cookie(-b cookies -c cookies)。对于请求,您不使用会话对象http://docs.python-requests.org/en/master/user/advanced/#session-objects,因此您的cookie数据将丢失。
2)您定义了一个https代理,然后您尝试通过http与无代理连接(到data-api-gslb.adnxs.net)。定义http和https,但只在会话对象上定义一次。见http://docs.python-requests.org/en/master/user/advanced/#proxies。 (这可能是您看到的错误消息的根本原因。)
3)请求自动处理重定向,无需提取位置标头并在下一个请求中使用它,它将自动重定向。因此,当其他错误得到修复时,有3个步骤而不是4个步骤。 (这也在上面的评论中回答了Hetzroni的问题。)
所以使用
s = requests.Session()
s.proxies = {
'http':'http://proxy.xxxxxx.net:xxxxx',
'https':'https://proxy.xxxxxx.net:xxxxx'
} # set this only once using valid proxy urls.
然后使用
s.get()
和
s.post()
而不是
requests.get()
和
requests.post()