Question

使用美丽的汤soup.findAll('a', {'link': 'go to'})后，我提取了一系列链接，如：

lis_links = ['https://foo.com/019774_s009_TEV 234.xml https://foo.com/019774_s009_TEV 23.xml https://foo.com/019774_s009_TEV24.xml https://foo.com/019774_s009_TEV 120.xml https://foo.com/WERW FOR INJ.xml']

正如您所看到的，某些链接有“”，如何使用正确的编码来修复该空白区域（我猜是%20）？。我尝试使用replace(' ', '%20')但我无法控制在何处使用它。

Answer 1

使用否定前瞻功能查找 http 未跟随的所有空格：\s(?!http)

RegEx demo

Python示例

import re

def fixLinks(str):
   return re.sub(r"\s(?!http)", "%20", str)

links = ["https://foo.com/019774_s009_TEV 234.xml https://foo.com/019774_s009_TEV 23.xml https://foo.com/019774_s009_TEV24.xml https://foo.com/019774_s009_TEV 120.xml https://foo.com/WERW FOR INJ.xml"]

links[0] = fixLinks(links[0])

print links[0];

Python online demo

如何正确修复python3中的链接列表？

1 个答案: