我正在尝试将此页面抓取10个class='name main-name'
,例如:sample source
但是当我编码时:
import requests
from bs4 import BeautifulSoup
result = requests.get("https://genvita.vn/thu-thach/7-ngay-detox-da-dep-dang-thon-nguoi-khoe-qua-soc-len-den-8-trieu-dong")
c = result.text
soup = BeautifulSoup(c, "html.parser")
comment_items = soup.find_all('div', class_="name main-name")
print(len(comment_items)
但是返回:0不返回:10。我尝试搜索并在stackoverflow中使用了许多解决方案,但无法解决
答案 0 :(得分:0)
正如我在评论中所述,它是动态生成的。所以这是Selenium的实现:
from selenium import webdriver
from bs4 import BeautifulSoup
url = "https://genvita.vn/thu-thach/7-ngay-detox-da-dep-dang-thon-nguoi-khoe-qua-soc-len-den-8-trieu-dong"
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(url)
c = driver.page_source
soup = BeautifulSoup(c, "html.parser")
comment_items = soup.find_all('div', {'class':"name main-name"})
print (len(comment_items))
driver.close()
输出:
print (len(comment_items))
10
答案 1 :(得分:0)
因为div name main-name
没有出现在您的DOM
中。在这种情况下,使用Selenium
比BeautifulSoap
from selenium import webdriver
driver_path = r'Your Chrome driver path'
browser = webdriver.Chrome(executable_path=driver_path)
browser.get("https://genvita.vn/thu-thach/7-ngay-detox-da-dep-dang-thon-nguoi-khoe-qua-soc-len-den-8-trieu-dong")
get_element = browser.find_elements_by_css_selector("div[class='name main-name']")
print len(get_element)
browser.close()
输出:
10
您还可以获得类似的名称:
for users in get_element:
print(users.text)
输出:
Phạm Thị Kim Chi
My Linh Nguyen
Mr Vinh Bảo Hiểm Sức Khoẻ Sắc Đẹp
Ngô Thị Tuyết
Huỳnh Thị Bích Trâm
Linh Trúc Diêm
Nguyen Tu
Nguyen Thom
Hồ Thu Trang
Trầnthịtrắng
答案 2 :(得分:0)
您可以使用beautifulsoup4选择功能
import requests
from bs4 import BeautifulSoup
result = requests.get("https://genvita.vn/thu-thach/7-ngay-detox-da-dep-dang-thon-nguoi-khoe-qua-soc-len-den-8-trieu-dong")
c = result.text
soup = BeautifulSoup(c, "html.parser")
comment_items = soup.select("div.name.main-name")
print(len(comment_items))