我试图通过python链接找到图像的数量(扩展名.jpg,.png,jpeg)。我可以使用任何库,如beautifulsoup。但是我该怎么做 我使用以下代码:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('HTMLS%5C110k_Source.htm'), "html.parser")
img_links = len(soup.find_all('.jpg'))
print("Number of Images : ", img_links)
但都是徒劳。
答案 0 :(得分:0)
如果您阅读docs
,这就像编写循环一样简单import bs4
import requests
url = 'somefoobar.net'
page = requests.get(url).text
soup = bs4.BeautifulSoup(page, 'lxml')
images = soup.findAll('img')
# loop through all img elements found and store the urls with matching extensions
urls = list(x for x in images if x['src'].split('.')[-1] in file_types)
print(urls)
print(len(urls))
答案 1 :(得分:0)
您可以尝试使用lxml.html
,如下所示:
from lxml import html
with open('HTMLS%5C110k_Source.htm', 'r') as f:
source = html.fromstring(f.read())
print(len(source.xpath('//img[contains(@src, ".jpg") or contains(@src, ".jpeg") or contains(@src, ".png")]')))