BeautifulSoup当它有../ ..时如何从img src获取网址?

时间:2012-11-15 18:19:04

标签: python beautifulsoup

所以我想说我试图获取某个图像的链接,如下所示:

from bs4 import BeautfiulSoup
import urlparse

soup = BeautifulSoup("http://examplesite.com")
for image in soup.findAll("img"):
    srcd = urlparse.urlparse(src)
    path = srcd.path # gets the path
    fn = os.path.basename(path) # gets filename

# lets say the webpage i was scraping had their images like this:
# <img src="../..someimage.jpg" />

有没有简单的方法可以获得完整的网址?或者我必须使用正则表达式吗?

1 个答案:

答案 0 :(得分:2)

使用urlparse.urljoin

>>> import urlparse
>>> base_url = "http://example.com/foo/"
>>> urlparse.urljoin(base_url, "../bar")
'http://example.com/bar'
>>> urlparse.urljoin(base_url, "/baz")
'http://example.com/baz'