我有要从表中抓取的这段代码:
<tr id="vsViewer1_dgMainView_dgMainView_ctl02" class="GridItem odd">
<td class=" ">
<a class="hlPopup" id="lbdgMainView$ctl02" name="lbdgMainView$ctl02" onclick="wrjl_test(this,'lbdgMainView$ctl02','746402:O9oY58XKE+w=:746402:746402')" onmouseover="this.className='HLPopupOver'" onmouseout="this.className='HLPopup'"></a>
<span class="HLPopup" id="lbldgMainView$ctl02" name="lbldgMainView$ctl02" onclick="wrjl_test(this,'lbldgMainView$ctl02','746402:O9oY58XKE+w=:746402:746402')"> Info </span>
</td>
<td align="center" class=" ">746402</td>
<td align="center" class=" ">Wyndham Orlando Resort International Drive</td>
<td align="center" class=" ">Interiano, Ana</td>
<td align="center" class=" ">Yes</td>
<td align="center" class=" ">7.32</td>
<td align="left" class=" ">
<table width="250" class="TextTableSmall" border="0">
<tbody>
<tr>
<td align="center" style="background-color: rgb(128, 128, 128); text-align: center; font-size: 8pt;">Date</td>
<td align="center" style="background-color: rgb(128, 128, 128); text-align: center; font-size: 8pt;">In</td>
<td align="center" style="background-color: rgb(128, 128, 128); text-align: center; font-size: 8pt;">Out</td>
<td align="center" style="background-color: rgb(128, 128, 128); text-align: center; font-size: 8pt;">Hours</td>
<td style="background-color: rgb(128, 128, 128); text-align: center; font-size: 8pt;">Shift</td>
</tr>
<tr>
<td style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">Thu 10/24/19</td>
<td align="right" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">8:00am</td>
<td align="right" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">1:20pm</td>
<td align="right" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">5.33</td>
<td align="center" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">1
<br>FL ORL Wyndham Resort I Drive 18128 - Housekeeping
<br>Room Attendant
</td>
</tr>
<tr>
<td style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">Thu 10/24/19</td>
<td align="right" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">1:39pm</td>
<td align="right" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">3:38pm</td>
<td align="right" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">1.98</td>
<td align="center" style="background-color: rgb(204, 204, 153); text-align: left; font-size: 8pt;">1
<br>FL ORL Wyndham Resort I Drive 18128 - Housekeeping
<br>Room Attendant
</td>
</tr>
</tbody>
</table>
</td>
<td align="right" class=" ">12.25</td>
<td class=" ">9.0000</td>
<td align="center" class=" ">1</td>
<td align="center" class=" ">Housekeeper</td>
<td align="center" class=" ">HOUSEKEEPER</td>
<td align="center" class=" ">SE-FL-Orlando</td>
<td align="center" class=" ">Wyndham Hotel Group</td>
</tr>
我已经做到了:
from bs4 import BeautifulSoup
import requests
with open('vsShowViewTWO.html') as html_file:
soup = BeautifulSoup(html_file,'lxml')
tbody = soup.find('tbody',id='thetbody')
table_rows=tbody.find_all('tr')
for tr in table_rows:
td = tr.find_all('td')
row = [i.text for i in td]
print(row)
结果为:
['Info','746402','Resort International','Interiano,Ana','Yes','7.32','DateInOutHoursShiftThu 10/24/198:00am1:20 pm5.331Resort I Drive 18128-HousekeepingRoom AttendantThu 10/24/191:39pm3:38 pm1.981Resort I Drive 18128-HousekeepingRoom Attendant','Date','In','Out','Hours','Shift','Thu 10/24/19','8 :00am','1:20 pm','5.33','1Resort I Drive 18128-HousekeepingRoom Attendant','Thu 10/24/19','1:39 pm','3:38 pm','1.98',' 1 Resort I Drive 18128-客房服务员”,“ 12.25”,“ 9.0000”,“ 1”,“管家”,“ HOUSEKEEPER”,“ SE”,“酒店集团”]
但是我并不需要整行,只是名字“ Interiano,Ana”和最后一个“ HOUSEKEEPER”,我一直在尝试索引行var而没有运气