如何针对重定向强行“掉头”

时间:2019-08-30 13:39:40

标签: python redirect web-scraping python-requests

嘿,StackOverFlow,

我正在尝试使用python请求库保存一些图像,但是,在尝试从中文网站保存图像时遇到了挑战。

我有3个示例代码片段来说明我的问题:

  1. 保存简单图像的理想模型情况
  2. 状态码:200。不同的输入URL和最终URL。虚拟图像已保存
  3. 状态码:302。相同的输入URL和最终URL。奇怪的图片已保存

示例图片:Image1/ Image2/

功能:

Add-ADPrincipalGroupMembership -Identity $UserName -MemberOf ($Groups | Where-Object {$_ -like "*Chicken*"})

案例1:理想案例

def get_response(url):
    print('Input URL:\n\t %s'%(url))
    response = requests.get(url)
    return response

def get_response_dont_redirect(url):
    print('Input URL:\n\t %s'%(url))
    response = requests.get(url, allow_redirects=False)
    return response

def check_response_status(response):
    status = response.status_code
    if status == 200:
        print(('Final URL:\n\t %s')%response.url)
        print('Status Code: %s / OK'%(status))
        return 'ok'
    if status == 302:
        print(('Final URL:\n\t %s')%response.url)
        print('Status Code: %s / Redirected'%(status))
        return 'redirected'
    if status == 404:
        print('Status Code: %s / Access Denied'%(status))
        return 'denied'

def save_image(response, status_code):
    if status_code ==302:
        with open('image_wanted.jpg', 'wb') as f:
            print('\nSaving image desired under "image_wanted.jpg"...\n')
            f.write(response.content)
    elif status_code == 200:
        with open('image_redirect.jpg', 'wb') as f:
            print('\nSaving image redirected under "image_redirect.jpg"...\n')
            f.write(response.content)
    elif status_code == 111:
        with open('image_normal.jpg', 'wb') as f:
            print('\nSaving image normal under "image_normal.jpg"...\n')
            f.write(response.content)

def case_1_comments():
    print('-------------------------------------------------------------------')
    print('#Comments:')
    print('# This is the ideal situation where I can simply download an image')
    print('-------------------------------------------------------------------')
def case_2_comments():
    print('-------------------------------------------------------------------')
    print('#Comments:')
    print('# Notice that despite the status code being 200, the input URL and final URL is different ')
    print('\t>I am definitely being redirected')
    print('\t>I get a dummy image from the redirected page')
    print('-------------------------------------------------------------------')
def case_3_comments():
    print('-------------------------------------------------------------------')
    print('#Comments:')
    print('# Here I have set the restriction of "allow_redirects=False" yet I get status code:302 ')
    print('\t>Somehow the input and final URL is the same')
    print('\t>The image saved is perpetually loading...')
    print('-------------------------------------------------------------------')

案例2:不带'allow_redirect = False'

print("\n\n--- Case 1: Ideal ---\n")

url = 'https://i5.walmartimages.ca/images/Large/094/514/6000200094514.jpg'
response = get_response(url)
status = check_response_status(response)
save_image(response, 111)
case_1_comments()

情况3:带有'allow_redirect = False'

print("\n\n--- Case 2: without 'allow_redirects=False' restriction ---\n")

url = 'http://photo.yupoo.com/evakicks/6b3a8a2a/small.jpg'
response = get_response(url)
status = check_response_status(response)
save_image(response, 200)
case_2_comments()

如果您复制粘贴我的代码并运行它(请参阅下面的问题,如果没有请先查看pip安装请求),您会发现情况2和3非常奇怪。我理想的目标是强制返回输入URL并将图像保存在该页面上。

我已经设法返回到案例3所示的页面,但是由于某种原因,该图像只是一个加载屏幕。

所以我想我的问题是:

  • 我实际上是否反对重定向?
  • 如何保存我想要的图片而没有得到正在加载的图片?

下面是要运行的整个脚本 (请原谅意大利面)

print("\n\n--- Case 3: with 'allow_redirects=False' restriction ---\n")

url = 'http://photo.yupoo.com/evakicks/6b3a8a2a/small.jpg'
response = get_response_dont_redirect(url)
status = check_response_status(response)
save_image(response, 302)
case_3_comments()

0 个答案:

没有答案