我有几个pageurl,在这里我尝试用:::
分隔的两个,所以我希望两个URL分开出现,但是在输出中,我只有两次URL。< / p>
from urllib.parse import urlparse
records="/Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:::""/Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4"
for record in records.split(':::'):
p1 = urlparse(records, 'https://')
netloc = p1.netloc
path = p1.path if p1.netloc else ''
if not netloc.startswith('amazon.in/'):
netloc = 'https://amazon.in/' +records
p2 = urlparse('.jpg', netloc, path)
p3=print(p2.geturl())
答案 0 :(得分:0)
我相信您有错字,请在第4行和第8行中将records
替换为record
。
您当前代码的输出:
https://amazon.in//Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:::/Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4:.jpg
https://amazon.in//Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:::/Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4:.jpg
替换后:
from urllib.parse import urlparse
records="/Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:::""/Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4"
for record in records.split(':::'):
p1 = urlparse(record, 'https://')
netloc = p1.netloc
path = p1.path if p1.netloc else ''
if not netloc.startswith('amazon.in/'):
netloc = 'https://amazon.in/' +record
p2 = urlparse('.jpg', netloc, path)
p3=print(p2.geturl())
output:
https://amazon.in//Bluetooth-Earphone-Control-Smartphones-Powerful/dp/B07NBQ67BN/ref=sr_1_1?fst=as%3Aoff&qid=1554760894&refinements=p_89%3AA+%26+Y&rnid=3837712031&s=electronics&sr=1-1:.jpg
https://amazon.in//Aroma-Magic-Mineral-Scrub-100ml/dp/B00H1Q4VZQ/ref=sr_1_4?fst=as%3Aoff&qid=1554351778&refinements=p_89%3AAroma+Magic&rnid=3837712031&s=beauty&sr=1-4:.jpg