我正在尝试检索给定网址的引脚数。我创建了这个Python脚本,它采用两个单独的URL并打印出每个URL的Pins数量。当我在本地机器上运行此脚本时,我返回了包含引脚数的200响应,但是,当我在EC2实例上运行完全相同的脚本时,我返回了403错误。
这是Python脚本:
#!/usr/bin/python
import requests
# Pinterest API
pinterest_endpoint = "http://api.pinterest.com/v1/urls/count.json?callback=&url="
# Emulate a SQL Query result (id, url)
results = [(1, "http://allrecipes.com/recipe/easter-nests/detail.aspx"), (2, "http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html")]
# Cycle thru each URL
for url in results:
# Print URL details
print url[0]
print url[1]
print type(url[0])
print type(url[1])
print "Downloading: ", url[1]
# Create Complete URL
target_url = pinterest_endpoint + url[1]
print target_url
# Hit Pinterest API
r = requests.get(target_url)
print r
print r.text
# Parse string response
start = r.text.find('\"count\"')
end = r.text.find(',', start+1)
content = len('\"count\"')
pin_count = int(r.text[(start+content+1):end].strip())
print pin_count
这是我在本地计算机上获得的响应(Ubuntu 12.04):
$ python pin_count.py
1
http://allrecipes.com/recipe/easter-nests/detail.aspx
<type 'int'>
<type 'str'>
Downloading: http://allrecipes.com/recipe/easter-nests/detail.aspx
http://api.pinterest.com/v1/urls/count.json?callback=&url=http://allrecipes.com/recipe/easter-nests/detail.aspx
<Response [200]>
({"count": 997, "url": "http://allrecipes.com/recipe/easter-nests/detail.aspx"})
997
2
http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html
<type 'int'>
<type 'str'>
Downloading: http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html
http://api.pinterest.com/v1/urls/count.json?callback=&url=http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html
<Response [200]>
({"count": 993, "url": "http://www.foodnetwork.com/recipes/ina-garten/maple-oatmeal-scones-recipe/index.html"})
993
这是我在EC2实例(Ubuntu)上运行相同脚本时得到的响应:
$ python pin_count.py
1
http://allrecipes.com/recipe/easter-nests/detail.aspx
<type 'int'>
<type 'str'>
Downloading: http://allrecipes.com/recipe/easter-nests/detail.aspx
http://api.pinterest.com/v1/urls/count.json?callback=&url=http://allrecipes.com/recipe/easter-nests/detail.aspx
<Response [403]>
{ "status": 403, "message": "Forbidden" }
Traceback (most recent call last):
File "cron2.py", line 32, in <module>
pin_count = int(r.text[(start+content+1):end].strip())
ValueError: invalid literal for int() with base 10: 'us": 403'
我理解为什么它会抛出一条ValueError消息,我不明白的是为什么我从EC2实例运行脚本时得到403响应但是它从我的本地机器按预期工作< / strong>即可。
非常感谢任何帮助!
答案 0 :(得分:2)
不是答案,但希望这可以节省其他人一小时尝试这种方法: Pinterest,毫不奇怪,似乎也阻止了来自退出路由器的请求。
我在同一个端点遇到了同样的问题,并将其缩小到EC2 + Pinterest。我试图通过tor路由请求来绕过它。
class PinterestService(Service):
service_url = "http://api.pinterest.com/v1/urls/count.json?callback="
url_param = 'url'
def get_response(self, url, **params):
params[self.url_param] = url
# privoxy listens by default on port 8118
# on the ec2 privoxy is configured to forward
# socks5 through tor like so:
# http://fixitts.com/2012/05/26/installing-tor-and-privoxy-on-ubuntu-server-or-any-other-linux-machine/
http_proxy = "socks5://127.0.0.1:8118"
proxyDict = {
"http" : http_proxy
}
return requests.get(self.service_url, params=params, proxies=proxyDict)
我已通过众多退出路由器循环,响应始终为{ "status": 403, "message": "Forbidden" }
对于解决方案,我将通过私人http代理服务器
答案 1 :(得分:2)
这个问题是几年前提出的,我认为目前的答案已经过时了。 EC2现在运行上述脚本,成功响应,无需代理。我在使用Google App Engine调查我自己的类似问题时遇到了这个问题。
答案 2 :(得分:1)
Pinterest可能阻止来自亚马逊拥有的IP块的请求,导致403:Forbidden错误。 Pinterest没有官方支持他们的API,所以(我的假设是)他们阻止了他们的API的最大商业用途来源。您可以使用非AWS提供程序中的实例对此进行测试。