Question

HTML：

<tbody>
       <tr >
           <td> Tim Cook </td>
           <td class="wpsTableNrmRow" > Apple CEO
               <a href:applicatiodetailaddress> all CEOs </a> // Nor required this node
           </td>
       </tr>
       <tr >
           <td> Sundar Pichai </td>
           <td class="wpsTableNrmRow" > Google CEO </td>
       </tr>
       <tr >
           <td> NoCompany </td>
           <td class="wpsTableNrmRow" > NOT, DEFINED</td>
       </tr>
</tbody>

代码：

applicationData = [td.text for td in webBrowser.find_elements_by_xpath('//td[@class="wpsTableNrmRow"]')]
record = {'Designation': applicationData[0],
 'Designation': applicationData[1],'Designation': applicationData[2]}

输出：

 Designation: Apple CEO all CEOs  // Not required 'all CEOs'
 Designation: Google CEO
 Designation: Not, DEFINED

我正在从表中抓取数据，

我该怎么做？

我试过[td.get_attribute("textContent").split("\n")[0] for td in webBrowser.find_elements_by_xpath('//td[@class="wpsTableNrmRow" and text()!=" "]')]

输出：

 Designation: Apple CEO  
 Designation: Google CEO
 Designation:           // should have value 'NOT, DEFINED'

如何获取价值？

Answer 1

applicationData = [td.get_attribute("textContent").split("\n")[0] for td in webBrowser.find_elements_by_xpath('//td[@class="wpsTableNrmRow"]')]
record = {'Designation1': applicationData[0], 'Designation2': applicationData[1]}

试试上面的代码，这里我们使用TextCONtent，它在不同的行返回不同的文本节点，所以你可以使用“\n”来分割它

如何在使用 selenium 抓取数据时跳过 <a> 标签

1 个答案: