所以我第一次尝试使用BeautifulSoup和Python进行网页抓取。我试图抓取的页面位于:http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172
client = request('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')
page_html = client.read()
client.close()
page_soup = soup(page_html)
identification = page_soup.find('div', {'data-bind':'text: name'})
print(identification.text)
当我这样做时,我只是得到一个空字符串。如果我打印出简单的识别变量,我得到:
<div class="col-xs-7" data-bind="text: name"></div>
答案 0 :(得分:0)
您可以尝试以下代码:
from selenium import webdriver
driver=webdriver.Chrome()
browser=driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')
find=driver.find_element_by_xpath('//*[@id="identificationCollapse"]/div/div/div/div[1]/div[1]/div[2]')
print(find.text)
输出:
A LEBLANC
答案 1 :(得分:0)
有几种方法可以实现相同的目标。但是,我在我的脚本中使用了选择器,它很容易理解,并且除非该网站的html结构发生重大变化,否则它的破坏机会就会减少。试试这个。
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')
soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
item_name = soup.select("[data-bind$='name']")[0].text
print(item_name)
结果:
A LEBLANC
顺便说一句,你开始的方式也会有效:
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')
soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
item_name = soup.find('div', {'data-bind':'text: name'}).text
print(item_name)