这是我的字符串,我必须从中提取网址
s = "'0352442':{url:'https://www.riteaid.com/shop/nexium-24hr-42-ct-capsules-0352442'},'0370009':{url:'https://www.riteaid.com/shop/rite-aid-pharmacy-epsom-salt-first-aid-6-lb-2-72-kg-0370009'},'0303249':{url:'https://www.riteaid.com/shop/huggies-natural-care-unscented-baby-wipes-soft-pack-56-count-0303249'},'0398568':{url:'https://www.riteaid.com/shop/rite-aid-sterile-pads-4-x4-25-ea-0398568'},}"
我尝试过的代码到现在才打印
urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', s)
但它只会打印此网址的重复
['https://www.riteaid.com']
答案 0 :(得分:1)
正如你所提到的字母像字符串一样,你必须使用正则表达式来处理你的特殊情况。
s = "'0352442':{url:'https://www.riteaid.com/shop/nexium-24hr-42-ct-capsules-0352442'},'0370009':{url:'https://www.riteaid.com/shop/rite-aid-pharmacy-epsom-salt-first-aid-6-lb-2-72-kg-0370009'},'0303249':{url:'https://www.riteaid.com/shop/huggies-natural-care-unscented-baby-wipes-soft-pack-56-count-0303249'},'0398568':{url:'https://www.riteaid.com/shop/rite-aid-sterile-pads-4-x4-25-ea-0398568'},}"
urls = re.findall(r"url:'(https?://.*?)'}", s)
result:
['https://www.riteaid.com/shop/nexium-24hr-42-ct-capsules-0352442',
'https://www.riteaid.com/shop/rite-aid-pharmacy-epsom-salt-first-aid-6-lb-2-72-kg-0370009',
'https://www.riteaid.com/shop/huggies-natural-care-unscented-baby-wipes-soft-pack-56-count-0303249',
'https://www.riteaid.com/shop/rite-aid-sterile-pads-4-x4-25-ea-0398568']
<强>解释强>
网址:&#39;(http :文字字符串
s?:可选的字面字符&#34; s&#34;
。*?:不贪心任何角色。
&#39;}::文字字符串
答案 1 :(得分:0)
如果您必须使用当前示例的正则表达式在{url:'
和'}
之间进行匹配,则可以使用肯定的lookbehind (?<=
和积极的预测{{1}并使用与(?=
一次或多次匹配的否定字符类[^']+
来匹配网址。
您还可以减少对示例数据的限制,并忽略前导'
和尾随{
: