Question

我有几个pageurl，在这里我尝试用:::分隔的两个，所以我希望两个URL分开出现，但是在输出中，我只有两次URL。< / p>

from urllib.parse import urlparse
records="/Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:::""/Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4"

for record in records.split(':::'):
    p1 = urlparse(records, 'https://')
    netloc = p1.netloc
    path = p1.path if p1.netloc else ''
    if not netloc.startswith('amazon.in/'):
        netloc = 'https://amazon.in/' +records
        p2 = urlparse('.jpg', netloc, path)
        p3=print(p2.geturl())

Answer 1

我相信您有错字，请在第4行和第8行中将records替换为record。

您当前代码的输出：

https://amazon.in//Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:::/Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4:.jpg
https://amazon.in//Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:::/Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4:.jpg

替换后：

from urllib.parse import urlparse
records="/Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:::""/Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4"

for record in records.split(':::'):
    p1 = urlparse(record, 'https://')
    netloc = p1.netloc
    path = p1.path if p1.netloc else ''
    if not netloc.startswith('amazon.in/'):
        netloc = 'https://amazon.in/' +record
        p2 = urlparse('.jpg', netloc, path)
        p3=print(p2.geturl())

output:
https://amazon.in//Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:.jpg
https://amazon.in//Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4:.jpg

从pageurl下载图像

1 个答案: