Question

我有以下HTML代码：

<div class="data-table data-table_detailed">
     <div class="cell">
         <div class="cell_label"> Label1 </div>
         <div class="cell_value"> Value2 </div>
    <div class="cell">
         <div class="cell_label"> Label2 </div>
         <div class="cell_value"> Value2 </div>
    <div class="cell">
         <div class="cell_label"> Label3 </div>
         <div class="cell_value"> Value3 </div>

我想使用BeautifulSoup获取与 Label2 相关的值。

我做以下

soup = BeautifulSoup(page)
datatable = soup.find(class_="data-table data-table_detailed")
datatable.find_all(class_="cell_label") #to get the list of labels

但是如何获得标签 Label2 的单元格中的值？

Answer 1

您可以使用find_next_sibling：

soup = BeautifulSoup(page)
datatable = soup.find(class_="data-table data-table_detailed")
cell_labels = datatable.find_all(class_="cell_label") #to get the list of labels

for cell_label in cell_labels:
    if "Label2" in cell_label.text:
        print(cell_label.find_next_sibling("div", {"class": "cell_value"}).text)

# results
 Value2

Answer 2

使用select方法（css选择器）会更容易：

value_tag = soup.select('.data-table.data-table_detailed .cell_value')[1] # Here you got your second value `tag`
text = value_tag.get_text() # Return the text inside your value element

Answer 3

此代码将在文档中找到第一个<div>标记，其中包含cell_label，其中（已剥离的）内容为Label2：

>>> soup.find('div', class_='cell_label', string=lambda s: s.strip() == 'Label2').find_next_sibling().string
u' Value2 '

如果您只需查找第一个<div>中包含的<div class="data-table data-table_detailed">：

>>> table = soup.find(class_="data-table data-table_detailed")
>>> table.find('div', class_='cell_label', string=lambda s: s.strip() == 'Label2').find_next_sibling().string
u' Value2 '

如何使用BeautifulSoup从html表中提取与标签单元格相关的值？

3 个答案: