我有一个表可以在这里找到:Ontario Gov Employee Directory我正在尝试遍历表来提取数据,但是很难找到xpath才能这样做。
当我检查我看到的元素时,表没有id:
<table title="results_list" border="0" width="100%" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td class="content" valign="top" align="right" width="50">1. </td>
<td class="content">[<a class="results" href='javascript:showEmployeeDetail("32528")'>Aagaard, Lindsay</a>] [ Senior Policy Advisor ] [TREASURY BOARD SECRETARIAT]
<br>[DEPUTY PREMIER AND PRESIDENT OF THE TREASURY BOARD, Toronto]
<!-- [416-327-0948] -->
[416-327-0948] [
<a href="mailto:lindsay.aagaard@ontario.ca">
lindsay.aagaard@ontario.ca</a>]
</td>
</tr>
<tr>
<td> </td>
</tr>
<tr>
<td class="content" valign="top" align="right" width="50">2. </td>
<td class="content">[<a class="results" href='javascript:showEmployeeDetail("34417")'>Aalto, Margaret</a>] [ Probation Officer ] [CHILDREN AND YOUTH SERVICES]
<br>[THUNDER BAY, Thunder Bay]
<!-- [807-475-1310] -->
[807-475-1310] [
<a href="mailto:margaret.aalto@ontario.ca">
margaret.aalto@ontario.ca</a>]
</td>
</tr>
<tr>
<td> </td>
</tr>
<tr>
<td class="content" valign="top" align="right" width="50">3. </td>
<td class="content">[<a class="results" href='javascript:showEmployeeDetail("9187")'>Aarlaht, Andrew</a>] [ Business Analyst ] [COMMUNITY AND SOCIAL SERVICES]
<br>[HAMILTON, BUSINESS SERVICES UNIT, Hamilton]
<!-- [905-521-7335] -->
[905-521-7335] [
<a href="mailto:andrew.aarlaht@ontario.ca">
andrew.aarlaht@ontario.ca</a>]
</td>
</tr>
<tr>
<td> </td>
</tr>
<tr>
<td class="content" valign="top" align="right" width="50">4. </td>
<td class="content">[<a class="results" href='javascript:showEmployeeDetail("9187")'>Aarlaht, Andrew</a>] [ Business Analyst ] [CHILDREN AND YOUTH SERVICES]
<br>[HAMILTON, BUSINESS SERVICES UNIT, Hamilton]
<!-- [905-521-7335] -->
[905-521-7335] [
<a href="mailto:andrew.aarlaht@ontario.ca">
andrew.aarlaht@ontario.ca</a>]
</td>
</tr>
<tr>
<td> </td>
</tr>
<tr>
<td class="content" valign="top" align="right" width="50">5. </td>
<td class="content">[<a class="results" href='javascript:showEmployeeDetail("19146")'>Aarons, Drew</a>] [ Messenger ] [LEGISLATIVE OFFICES]
<br>[PARLIAMENTARY PROTOCOL, Toronto]
<!-- [416-325-7455] -->
[416-325-7455] [
<a href="mailto:daarons@ola.org">
daarons@ola.org</a>]
</td>
</tr>
<tr>
<td> </td>
</tr>
<tr>
<td class="content" valign="top" align="right" width="50">6. </td>
<td class="content">[<a class="results" href='javascript:showEmployeeDetail("113729")'>Aaswaakshin, Neegann</a>] [ Articling Student ] [ABORIGINAL AFFAIRS]
<br>[LEGAL SERVICES, Toronto]
<!-- [416-212-2271] -->
[416-212-2271] [
<a href="mailto:Neegann.Aaswaakshin@ontario.ca">
Neegann.Aaswaakshin@ontario.ca</a>]
</td>
</tr>
<tr>
<td> </td>
</tr>
<tr>
<td class="content" valign="top" align="right" width="50">7. </td>
<td class="content">[<a class="results" href='javascript:showEmployeeDetail("32196")'>Abad, Lilian</a>] [ Executive Assistant ] [TRANSPORTATION]
<br>[GO TRANSIT, Toronto]
<!-- [416-202-5506] -->
[416-202-5506] [
<a href="mailto:lilian.abad@gotransit.com">
lilian.abad@gotransit.com</a>]
</td>
</tr>
<tr>
<td> </td>
</tr>
<tr>
<td class="content" valign="top" align="right" width="50">8. </td>
<td class="content">[<a class="results" href='javascript:showEmployeeDetail("114240")'>Abadesso, Jennifer</a>] [ Employment Program Consultant (Acting) ] [TRAINING, COLLEGES AND UNIVERSITIES]
<br>[FOUNDATION SKILLS, Toronto]
<!-- [416-327-2065] -->
[416-327-2065] [
<a href="mailto:jennifer.abadesso@ontario.ca">
jennifer.abadesso@ontario.ca</a>]
</td>
</tr>
<tr>
<td> </td>
</tr>
<tr>
<td class="content" valign="top" align="right" width="50">9. </td>
<td class="content">[<a class="results" href='javascript:showEmployeeDetail("104293")'>Abakunzi, Louis</a>] [ Customer Service Representative (Bilingual) ] [GOVERNMENT AND CONSUMER SERVICES]
<br>[SERVICEONTARIO CONTACT CENTRE - NORTH YORK, Toronto]
<!-- [416-235-2999] -->
[416-235-2999] [
<a href="mailto:Louis.K.Abakunzi@ontario.ca">
Louis.K.Abakunzi@ontario.ca</a>]
</td>
</tr>
<tr>
<td> </td>
</tr>
<tr>
<td class="content" valign="top" align="right" width="50">10. </td>
<td class="content">[<a class="results" href='javascript:showEmployeeDetail("19309")'>Aban, Edencio</a>] [ Audit Supervisor ] [ATTORNEY GENERAL]
<br>[AUDIT AND COMPLIANCE, Toronto]
<!-- [416-326-6295] -->
[416-326-6295] [
<a href="mailto:edencio.aban@agco.ca">
edencio.aban@agco.ca</a>]
</td>
</tr>
<tr>
<td> </td>
</tr>
</tbody>
</table>
如何遍历这些行中的数据?
答案 0 :(得分:1)
它是表格中的一个表格,然后有一些非常标准的格式。你有什么挑战?
当我检查我看到的元素时,表格没有id:
它可以使用其他属性,例如标题。使用xpath //table[@title="results_list"]/tbody/tr/td
查找最里面的表中的每个数据元素。或者从xpath中删除最后一个/td
以获取每一行。之后,找到其下的每个td
元素并使用其text
。
注意:最里面的表有第一列带有序列号,第二列带有实际数据。我建议让每个td
然后使用&#39; innerHTML&#39;属性或elem.text
。之后,使用常规的exppresion来提取不同的部分。
>>> all_tdata = driver.find_elements_by_xpath('//table[@title="results_list"]/tbody/tr/td')
>>> for td in all_tdata:
... print td.get_attribute('innerHTML') # save this in var and regex it
... # or
... data = td.text