准确计算链接图像的数量

时间:2017-10-24 12:01:11

标签: python web-scraping beautifulsoup html-parsing

我试图通过python链接找到图像的数量(扩展名.jpg,.png,jpeg)。我可以使用任何库,如beautifulsoup。但是我该怎么做 我使用以下代码:

from bs4 import BeautifulSoup
soup = BeautifulSoup(open('HTMLS%5C110k_Source.htm'), "html.parser")
img_links = len(soup.find_all('.jpg'))
print("Number of Images : ", img_links)

但都是徒劳。

2 个答案:

答案 0 :(得分:0)

如果您阅读docs

,这就像编写循环一样简单
import bs4
import requests

url = 'somefoobar.net'
page = requests.get(url).text
soup = bs4.BeautifulSoup(page, 'lxml')

images = soup.findAll('img')

# loop through all img elements found and store the urls with matching extensions
urls = list(x for x in images if x['src'].split('.')[-1] in file_types)

print(urls)
print(len(urls))

答案 1 :(得分:0)

您可以尝试使用lxml.html,如下所示:

from lxml import html
with open('HTMLS%5C110k_Source.htm', 'r') as f:
    source = html.fromstring(f.read())
    print(len(source.xpath('//img[contains(@src, ".jpg") or contains(@src, ".jpeg") or contains(@src, ".png")]')))