如何使用Python从HTML页面提取信息?

时间:2018-11-26 01:04:30

标签: html css python-2.7

基于下面的HTML页面,我希望提取有关此属性的以下信息:

1-浴室数量

2-居住区

3-能源额定值

4-说明

                

                <div class="bloco-imovel-resumo-dados">
                    <div id="Cpl_modulodadosresumidos_module_holder" class="modulo-dados-resumidos">

<h2 class="lbl_descricao_dados">Property Information</h2>

<ul class="bloco-dados">

    <li>
        <b>Condition:</b> <span>Renewed</span></li>
    <li>
        <b>Living Area:</b><span> 80 m<sup>2</sup></span></li>
    <li>
        <b>Total Area:</b><span> 0 m<sup>2</sup></span></li>
    <li>
        <b>Bathrooms:</b><span> 1 </span></li>
    <li>
        <b>Bedrooms:</b><span> 2 </span></li>
    <li>
        <b>Energy Rating:</b><span> C</span></li>

</ul>

                    

                <div class="bloco-imovel-texto">
                    <h3 class="lbl_description">
                        Description </h3>
                    <p>At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident.Nam libero tempore, omnis dolor repellendus.</p>
                </div>

我尝试通过编写下面的代码来提取浴室的数量,但是我收到了此错误"AttributeError: 'HtmlElement' object has no attribute 'find_element_by_css_selector"

from lxml import html,etree

with open(r'listing.html', "r") as f:

    page = f.read()

    tree = html.fromstring(page)

    Bathrooms = tree.find_element_by_css_selector('Bathrooms')

print('Bathrooms: {}'.format(tree.cssselect(Bathrooms)[0].text))

我是HTML和CSS的初学者,因此需要您的帮助。

1 个答案:

答案 0 :(得分:0)

 import lxml.html

    with open(r'listing.html', "r") as f:

        page = f.read()
        root=lxml.html.parse(page)
        object_list = root.xpath(".//div[@class='bloco-dado']")
        bathrooms = object_list[0]
        text=bathrooms.text_content()
 print(text)

尝试一次,可能可行