为什么我的CSS选择器不能与beautifulsoup一起使用,但是可以作为chrome控制台查询正常工作?

时间:2019-05-30 08:04:54

标签: python css beautifulsoup

我有一个css选择器,在chrome JS控制台中执行该选择器时效果很好,但是在一个示例中通过beautifulsoup运行它时却无法正常工作,而在另一个示例中却可以工作(我无法辨别两者之间的区别) 。

url_1 = 'https://www.amazon.com/s?k=bacopa&page=1'
url_2 = 'https://www.amazon.com/s?k=acorus+calamus&page=1'

在chrome控制台中执行以下查询时,以下查询均可正常使用。

document.querySelectorAll('div.s-result-item') 

然后通过beautifulsoup运行两个URL,这是我得到的输出。

url_1(有效)

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
r = requests.get(url_1, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
listings = soup .select('div.s-result-item')
print(len(listings))

输出:53(正确)

url_2(无效)

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
r = requests.get(url_2, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
listings = soup.select('div.s-result-item')
print(len(listings))

输出:0(错误-预期:49)

有人知道这里可能发生什么以及如何使CSS选择器与beautifulsoup一起使用吗?

2 个答案:

答案 0 :(得分:0)

尝试selenium library下载网页

from selenium import webdriver
from bs4 import BeautifulSoup

url_1 = 'https://www.amazon.com/s?k=bacopa&page=1'
url_2 = 'https://www.amazon.com/s?k=acorus+calamus&page=1'

#set chrome webdriver path
driver = webdriver.Chrome('/usr/bin/chromedriver')

#download webpage
driver.get(url_2)

soup = BeautifulSoup(driver.page_source, 'html.parser')
listings = soup.find_all('div',{'class':'s-result-item'})

print(len(listings))

O / P:

url_1: 50

url_2 : 48

答案 1 :(得分:0)

我认为它是html。将解析器更改为“ lxml”。您还可以将CSS选择器缩短为仅分类,并重新使用与[HttpPost] public ActionResult MakaleOlustur(Makale m, HttpPostedFileBase file) { try { using (MvcBlogContext context = new MvcBlogContext()) { Makale _makale = new Makale(); if (file != null && file.ContentLength > 0) { MemoryStream memoryStream = file.InputStream as MemoryStream; if (memoryStream == null) { memoryStream = new MemoryStream(); file.InputStream.CopyTo(memoryStream); } _makale.Foto = memoryStream.ToArray(); } _makale.Baslik = m.Baslik; _makale.OlusturmaTarihi = DateTime.Now; _makale.Icerik = m.Icerik; context.Makale.Add(_makale); context.SaveChanges(); return RedirectToAction("Makale", "Admin"); } } catch (Exception ex) { throw new Exception("Eklerken hata oluştu" + ex.Message); } } 对象的连接以提高效率。

Session