从两个不同的表格中获取表格(仅包含值)?

时间:2019-07-27 13:40:34

标签: python selenium xpath beautifulsoup

我想从同一个类的两个不同表中获取或选择数据,我试图从'soup.find_all'获取它,但是格式化数据变得越来越困难。 有两个具有相同类的表。我只需要从表中获取值(而不是标签)。

表1:

<div class="bh_collapsible-body" style="display: none;">
  <table border="0" cellpadding="2" cellspacing="2" class="prop-list">
    <tbody>
    <tr>
      <td class="item">
        <table>
          <tbody>
          <tr>
            <td class="label">Rim Material</td>
            <td class="value">Alloy</td>
          </tr>
          </tbody>
        </table>
      </td>
      <td class="item">
        <table>
          <tbody>
          <tr>
            <td class="label">Front Tyre Description</td>
            <td class="value">215/55 R16</td>
          </tr>
          </tbody>
        </table>
      </td>
    </tr>
    <tr>
      <td class="item">
        <table>
          <tbody>
          <tr>
            <td class="label">Front Rim Description</td>
            <td class="value">16x7.0</td>
          </tr>
          </tbody>
        </table>
      </td>
      <td class="item">
        <table>
          <tbody>
          <tr>
            <td class="label">Rear Tyre Description</td>
            <td class="value">215/55 R16</td>
          </tr>
          </tbody>
        </table>
      </td>
    </tr>
    <tr>
      <td class="item">
        <table>
          <tbody>
          <tr>
            <td class="label">Rear Rim Description</td>
            <td class="value">16x7.0</td>
          </tr>
          </tbody>
        </table>
      </td>
      <td></td>
    </tr>
    </tbody>
  </table>
</div>
</div>

表2:

<div class="bh_collapsible-body" style="display: none;">
  <table border="0" cellpadding="2" cellspacing="2" class="prop-list">
    <tbody>
    <tr>
      <td class="item">
        <table>
          <tbody>
          <tr>
            <td class="label">Steering</td>
            <td class="value">Rack and Pinion</td>
          </tr>
          </tbody>
        </table>
      </td>
      <td></td>
    </tr>
    </tbody>
  </table>
</div>
</div>

我尝试过的事情:

我尝试从Xpath获取第一个表的内容,但同时给出了值和标签。

table1 = driver.find_element_by_xpath("//*[@id='features']/div/div[5]/div[2]/div[1]/div[1]/div/div[2]/table/tbody/tr[1]/td[1]/table/tbody/tr/td[2]")

我试图拆分数据,但没有成功

1 个答案:

答案 0 :(得分:1)

我认为您正在寻找CSS选择器writer = pd.ExcelWriter("file_name.xlsx") df.to_excel(writer, 'Sheet1',startrow = 1) workbook1 = writer.book worksheets = writer.sheets worksheet1 = worksheets['Sheet1'] worksheet1.write(0, 0, df.columns.name) writer.save() writer.close() ,它将选择最里面的tr:not(:has(tr))

<tr>

打印:

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser') # the variable data contains string for Table1 and Table2 in your question

rows = []
for tr in soup.select('tr:not(:has(tr))'):
    rows.append([td.get_text(strip=True) for td in tr.select('td')])

for row in zip(*rows):
    print(''.join('{: ^25}'.format(d) for d in row))

变量 Rim Material Front Tyre Description Front Rim Description Rear Tyre Description Rear Rim Description Steering Alloy 215/55 R16 16x7.0 215/55 R16 16x7.0 Rack and Pinion 包含:

rows

进一步阅读:

CSS Selectors Reference

编辑:从CSS选择器更改为[['Rim Material', 'Alloy'], ['Front Tyre Description', '215/55 R16'], ['Front Rim Description', '16x7.0'], ['Rear Tyre Description', '215/55 R16'], ['Rear Rim Description', '16x7.0'], ['Steering', 'Rack and Pinion']]