Question

我正在尝试从html表中提取信息（在此示例页面https://www.detrasdelafachada.com/house-for-sale-marianao-havana-cuba/dcyktckvwjxhpl9中找到）：

<div class="row">
    <div class="col-label">
        Type of property:
    </div>
    <div class="col-datos">
        Apartment </div>
</div>
<div class="row">
    <div class="col-label">
        Building style:
    </div>
    <div class="col-datos">
        50 year </div>
</div>
<div class="row">
    <div class="col-label precio">
        Sale price:
    </div>
    <div class="col-datos precio">
        12 000 CUC </div>
</div>
<div class="row">
    <div class="col-label">
        Rooms:
    </div>
    <div class="col-datos">
        1 </div>
</div>
<div class="row">
    <div class="col-label">
        Bathrooms:
    </div>
    <div class="col-datos">
        1 </div>
</div>
<div class="row">
    <div class="col-label">
        Kitchens:
    </div>
    <div class="col-datos">
        1 </div>
</div>
<div class="row">
    <div class="col-label">
        Surface:
    </div>
    <div class="col-datos">
        38 mts2 </div>
</div>
<div class="row">
    <div class="col-label">
        Year of construction:
    </div>
    <div class="col-datos">
        1945 </div>
</div>
<div class="row">
    <div class="col-label">
        Building style:
    </div>
    <div class="col-datos">
        50 year </div>
</div>
<div class="row">
    <div class="col-label">
        Construction type:
    </div>
    <div class="col-datos">
        Masonry and plate </div>
</div>
<div class="row">
    <div class="col-label">
        Home conditions:
    </div>
    <div class="col-datos">
        Good </div>
</div>
<div class="row">
    <div class="col-label">
        Other peculiarities:
    </div>
</div>
<div class="row">

使用美丽的汤，我如何找到“建筑风格：”的价值（以及其他条目）？

我的问题是我直接找到该类，因为表中的所有条目都具有相同的div类名称。

Answer 1

您可以遍历每行div并找到嵌套的div值：

from bs4 import BeautifulSoup as soup
import re
d = soup(content, 'html.parser')
results = [[re.sub('\s{2,}|\n+', '', i.text) for i in b.find_all('div')] for b in d.find_all('div', {'class':'row'})]

输出：

[['Type of property:', 'Apartment '], ['Building style:', '50 year '], ['Sale price:', '12 000 CUC '], ['Rooms:', '1 '], ['Bathrooms:', '1 '], ['Kitchens:', '1 '], ['Surface:', '38 mts2 '], ['Year of construction:', '1945 '], ['Building style:', '50 year '], ['Construction type:', 'Masonry and plate '], ['Home conditions:', 'Good '], ['Other peculiarities:'], []]

Answer 2

例如，如果您知道要专门查找字符串“ Building style：”，则可以捕获var recipeId = /* got id from somewhere */ var ingredientText = /* got ingredient from somewhere */ removeIngredient( recipeId, ingredientText );的文本。或仅使用.next_sibling：

next

如果您想全部使用它们，则可以使用>>> from bs4 import BeautifulSoup >>> html = "<c><div>hello</div> <div>hi</div></c>" >>> soup = BeautifulSoup(html, 'html.parser') >>> print(soup.find(string="hello").find_next('div').contents[0]) hi获取类“ .find_all”的所有div标签，然后抓住每个孩子的孩子。

row

在HTML表中使用Beautiful Soup查找信息

2 个答案: