从pageurl下载图像

时间:2019-05-24 06:45:41

标签: python python-3.x list python-requests pycharm

我有几个pageurl,在这里我尝试用:::分隔的两个,所以我希望两个URL分开出现,但是在输出中,我只有两次URL。< / p>

from urllib.parse import urlparse
records="/Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:::""/Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4"

for record in records.split(':::'):
    p1 = urlparse(records, 'https://')
    netloc = p1.netloc
    path = p1.path if p1.netloc else ''
    if not netloc.startswith('amazon.in/'):
        netloc = 'https://amazon.in/' +records
        p2 = urlparse('.jpg', netloc, path)
        p3=print(p2.geturl())

1 个答案:

答案 0 :(得分:0)

我相信您有错字,请在第4行和第8行中将records替换为record

您当前代码的输出:

https://amazon.in//Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:::/Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4:.jpg
https://amazon.in//Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:::/Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4:.jpg

替换后:

from urllib.parse import urlparse
records="/Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:::""/Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4"

for record in records.split(':::'):
    p1 = urlparse(record, 'https://')
    netloc = p1.netloc
    path = p1.path if p1.netloc else ''
    if not netloc.startswith('amazon.in/'):
        netloc = 'https://amazon.in/' +record
        p2 = urlparse('.jpg', netloc, path)
        p3=print(p2.geturl())

output:
https://amazon.in//Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:.jpg
https://amazon.in//Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4:.jpg