我想从同一个类的两个不同表中获取或选择数据,我试图从'soup.find_all'获取它,但是格式化数据变得越来越困难。 有两个具有相同类的表。我只需要从表中获取值(而不是标签)。
表1:
<div class="bh_collapsible-body" style="display: none;">
<table border="0" cellpadding="2" cellspacing="2" class="prop-list">
<tbody>
<tr>
<td class="item">
<table>
<tbody>
<tr>
<td class="label">Rim Material</td>
<td class="value">Alloy</td>
</tr>
</tbody>
</table>
</td>
<td class="item">
<table>
<tbody>
<tr>
<td class="label">Front Tyre Description</td>
<td class="value">215/55 R16</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td class="item">
<table>
<tbody>
<tr>
<td class="label">Front Rim Description</td>
<td class="value">16x7.0</td>
</tr>
</tbody>
</table>
</td>
<td class="item">
<table>
<tbody>
<tr>
<td class="label">Rear Tyre Description</td>
<td class="value">215/55 R16</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td class="item">
<table>
<tbody>
<tr>
<td class="label">Rear Rim Description</td>
<td class="value">16x7.0</td>
</tr>
</tbody>
</table>
</td>
<td></td>
</tr>
</tbody>
</table>
</div>
</div>
表2:
<div class="bh_collapsible-body" style="display: none;">
<table border="0" cellpadding="2" cellspacing="2" class="prop-list">
<tbody>
<tr>
<td class="item">
<table>
<tbody>
<tr>
<td class="label">Steering</td>
<td class="value">Rack and Pinion</td>
</tr>
</tbody>
</table>
</td>
<td></td>
</tr>
</tbody>
</table>
</div>
</div>
我尝试过的事情:
我尝试从Xpath获取第一个表的内容,但同时给出了值和标签。
table1 = driver.find_element_by_xpath("//*[@id='features']/div/div[5]/div[2]/div[1]/div[1]/div/div[2]/table/tbody/tr[1]/td[1]/table/tbody/tr/td[2]")
我试图拆分数据,但没有成功
答案 0 :(得分:1)
我认为您正在寻找CSS选择器writer = pd.ExcelWriter("file_name.xlsx")
df.to_excel(writer, 'Sheet1',startrow = 1)
workbook1 = writer.book
worksheets = writer.sheets
worksheet1 = worksheets['Sheet1']
worksheet1.write(0, 0, df.columns.name)
writer.save()
writer.close()
,它将选择最里面的tr:not(:has(tr))
:
<tr>
打印:
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser') # the variable data contains string for Table1 and Table2 in your question
rows = []
for tr in soup.select('tr:not(:has(tr))'):
rows.append([td.get_text(strip=True) for td in tr.select('td')])
for row in zip(*rows):
print(''.join('{: ^25}'.format(d) for d in row))
变量 Rim Material Front Tyre Description Front Rim Description Rear Tyre Description Rear Rim Description Steering
Alloy 215/55 R16 16x7.0 215/55 R16 16x7.0 Rack and Pinion
包含:
rows
进一步阅读:
编辑:从CSS选择器更改为[['Rim Material', 'Alloy'],
['Front Tyre Description', '215/55 R16'],
['Front Rim Description', '16x7.0'],
['Rear Tyre Description', '215/55 R16'],
['Rear Rim Description', '16x7.0'],
['Steering', 'Rack and Pinion']]