我有一个css选择器,在chrome JS控制台中执行该选择器时效果很好,但是在一个示例中通过beautifulsoup运行它时却无法正常工作,而在另一个示例中却可以工作(我无法辨别两者之间的区别) 。
url_1 = 'https://www.amazon.com/s?k=bacopa&page=1'
url_2 = 'https://www.amazon.com/s?k=acorus+calamus&page=1'
在chrome控制台中执行以下查询时,以下查询均可正常使用。
document.querySelectorAll('div.s-result-item')
然后通过beautifulsoup运行两个URL,这是我得到的输出。
url_1(有效)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
r = requests.get(url_1, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
listings = soup .select('div.s-result-item')
print(len(listings))
输出:53(正确)
url_2(无效)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
r = requests.get(url_2, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
listings = soup.select('div.s-result-item')
print(len(listings))
输出:0(错误-预期:49)
有人知道这里可能发生什么以及如何使CSS选择器与beautifulsoup一起使用吗?
答案 0 :(得分:0)
尝试selenium library
下载网页
from selenium import webdriver
from bs4 import BeautifulSoup
url_1 = 'https://www.amazon.com/s?k=bacopa&page=1'
url_2 = 'https://www.amazon.com/s?k=acorus+calamus&page=1'
#set chrome webdriver path
driver = webdriver.Chrome('/usr/bin/chromedriver')
#download webpage
driver.get(url_2)
soup = BeautifulSoup(driver.page_source, 'html.parser')
listings = soup.find_all('div',{'class':'s-result-item'})
print(len(listings))
O / P:
url_1: 50
url_2 : 48
答案 1 :(得分:0)
我认为它是html。将解析器更改为“ lxml”。您还可以将CSS选择器缩短为仅分类,并重新使用与[HttpPost]
public ActionResult MakaleOlustur(Makale m, HttpPostedFileBase file)
{
try
{
using (MvcBlogContext context = new MvcBlogContext())
{
Makale _makale = new Makale();
if (file != null && file.ContentLength > 0)
{
MemoryStream memoryStream = file.InputStream as MemoryStream;
if (memoryStream == null)
{
memoryStream = new MemoryStream();
file.InputStream.CopyTo(memoryStream);
}
_makale.Foto = memoryStream.ToArray();
}
_makale.Baslik = m.Baslik;
_makale.OlusturmaTarihi = DateTime.Now;
_makale.Icerik = m.Icerik;
context.Makale.Add(_makale);
context.SaveChanges();
return RedirectToAction("Makale", "Admin");
}
}
catch (Exception ex)
{
throw new Exception("Eklerken hata oluştu" + ex.Message);
}
}
对象的连接以提高效率。
Session