HTML:
<tbody>
<tr >
<td> Tim Cook </td>
<td class="wpsTableNrmRow" > Apple CEO
<a href:applicatiodetailaddress> all CEOs </a> // Nor required this node
</td>
</tr>
<tr >
<td> Sundar Pichai </td>
<td class="wpsTableNrmRow" > Google CEO </td>
</tr>
<tr >
<td> NoCompany </td>
<td class="wpsTableNrmRow" > NOT, DEFINED</td>
</tr>
</tbody>
代码:
applicationData = [td.text for td in webBrowser.find_elements_by_xpath('//td[@class="wpsTableNrmRow"]')]
record = {'Designation': applicationData[0],
'Designation': applicationData[1],'Designation': applicationData[2]}
输出:
Designation: Apple CEO all CEOs // Not required 'all CEOs'
Designation: Google CEO
Designation: Not, DEFINED
答案 0 :(得分:1)
applicationData = [td.get_attribute("textContent").split("\n")[0] for td in webBrowser.find_elements_by_xpath('//td[@class="wpsTableNrmRow"]')]
record = {'Designation1': applicationData[0], 'Designation2': applicationData[1]}
试试上面的代码,这里我们使用TextCONtent,它在不同的行返回不同的文本节点,所以你可以使用“\n”来分割它