Question

我正在抓取网站并保存所选网页以供离线浏览。当然，页面之间的链接已被破坏，所以我想对此做些什么。有没有一种简单的方法在python中重新激活像这样的url路径而不重新发明轮子：

/folder1/folder2/somepage.html    becomes---->   folder2/somepage.html
/folder1/otherpage.html           becomes---->   ../otherpage.html

我理解该函数需要两个url路径来确定相对路径，因为链接资源的路径是相对于它出现的页面。

Answer 1

较新的pathlib（仅在副本中简要提及）也适用于网址：

from pathlib import Path

abs_p = Path('https://docs.python.org/3/library/pathlib.html')
rel_p = abs_p.relative_to('https://docs.python.org/3/')
print(rel_p)  # library/pathlib.html

是否有一个python函数用相对url路径替换绝对URL？

1 个答案: