为什么我没有获得该领域的价值而不是领域本身?

时间:2017-10-12 17:31:25

标签: python html web-scraping beautifulsoup

所以我第一次尝试使用BeautifulSoup和Python进行网页抓取。我试图抓取的页面位于:http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172

client = request('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')
page_html = client.read()
client.close()
page_soup = soup(page_html)

identification = page_soup.find('div', {'data-bind':'text: name'})
print(identification.text)

当我这样做时,我只是得到一个空字符串。如果我打印出简单的识别变量,我得到:

<div class="col-xs-7" data-bind="text: name"></div>

This is the line of html that I am trying to get the value of, as you can see there is a value A LEBLANC there in the tag

2 个答案:

答案 0 :(得分:0)

您可以尝试以下代码:

from selenium import webdriver

driver=webdriver.Chrome()

browser=driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')

find=driver.find_element_by_xpath('//*[@id="identificationCollapse"]/div/div/div/div[1]/div[1]/div[2]')

print(find.text)

输出:

A LEBLANC

答案 1 :(得分:0)

有几种方法可以实现相同的目标。但是,我在我的脚本中使用了选择器,它很容易理解,并且除非该网站的html结构发生重大变化,否则它的破坏机会就会减少。试试这个。

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')
soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
item_name = soup.select("[data-bind$='name']")[0].text
print(item_name)

结果:

A LEBLANC

顺便说一句,你开始的方式也会有效:

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')
soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
item_name = soup.find('div', {'data-bind':'text: name'}).text
print(item_name)