Python 3 - Selenium - 从嵌套div中抓取数据

时间:2016-03-23 23:45:08

标签: python selenium selenium-webdriver

我是python的新手,我试图检查哪个div类首先出现在页面上。我用表格行完成了这项工作,但我似乎无法用div来解决这个问题。

我要确定的最新更新是发送的电子邮件<div class="EMAIL SENT">还是添加了<div class="Notes">的备注。最新项目将首先显示在顶部,但此后可能已执行其他操作,例如<div class="Updated">

我没有设法编写任何代码来执行此操作或事件接近,但在我的脑海中,我想它会像这样工作。

for sub_div_classes in browser.find_element_by_class_name('cb'):
    classname = ~check name of sub_div_class
    if classname = "EMAIL SENT":
        class_info = browser.find_element_by_class_name('plus_header_Additional_info').text
        print(class_info) ¬output: EMAIL SENT&nbsp;:Email sent on 20-03-2016 00:22:09 by [REDACTED]
        trigger_1()
    if classname = "Notes":
        trigger_2()
    ~move on to next div class in list

以下是我尝试使用的网页代码。我真的很感激任何人可以提供的任何建议或帮助。

<div class="cb" style="margin:5px 0 0 0;">
                              <div class="Updated">
                               <div class="plus_header_Additional_info">Updated&nbsp;:Incident Updated on 20-03-2016 00:22:52 by User = [REDACTED]

                                 <a href="javascript:toggle2('contentDivImg2_0', 'imageDivLink2_0');" id="imageDivLink2_0"><img src="images/minus.png" style="float:right;"></a> 
                               </div>
                                   <div class="plus_content" style="display: block;" id="contentDivImg2_0">
                               <div> 
                                             Assigned to STRIKE1, 
 by User = [REDACTED].
                                </div>
                                <br>
                            </div>
                                </div>
                              <div class="Updated">
                               <div class="plus_header_Additional_info">Updated&nbsp;:PEND CLIENT STRIKE - 1 added on 20-03-2016 00:22:36 by [REDACTED]. 
                                 <a href="javascript:toggle2('contentDivImg2_1', 'imageDivLink2_1');" id="imageDivLink2_1"><img src="images/minus.png" style="float:right;"></a> 
                               </div>
                                   <div class="plus_content" style="display: block;" id="contentDivImg2_1">
                               <div> 
                                </div>
                                <br>
                            </div>
                                </div>
                              <div class="EMAIL SENT">
                               <div class="plus_header_Additional_info">EMAIL SENT&nbsp;:Email sent on 20-03-2016 00:22:09 by [REDACTED] 
                                 <a href="javascript:toggle2('contentDivImg2_2', 'imageDivLink2_2');" id="imageDivLink2_2"><img src="images/minus.png" style="float:right;"></a> 
                               </div>
                                   <div class="plus_content" style="display: block;" id="contentDivImg2_2">
                               <div> 
                                            To :- [NAME]@[DOMAIN].CO.UK Subject: Ticket - [IN-000999999] Description : Dear User,

[REDACTED]
                                </div>
                                <br>
                            </div>
                                </div>
                              <div class="Updated">
                               <div class="plus_header_Additional_info">Updated&nbsp;:Incident Updated on 12-03-2016 10:56:15 by User = [REDACTED]

                                 <a href="javascript:toggle2('contentDivImg2_3', 'imageDivLink2_3');" id="imageDivLink2_3"><img src="images/minus.png" style="float:right;"></a> 
                               </div>
                                   <div class="plus_content" style="display: block;" id="contentDivImg2_3">
                               <div> 
                                             Status:- PROGRESSING changed to PEND CLIENT, 
 Assigned to SOFTWARE DEPLOYED, 
 by User = [REDACTED].
                                </div>
                                <br>
                            </div>
                                </div>
                              <div class="Notes">
                               <div class="plus_header_Additional_info">Notes&nbsp;:Notes Added on 12-03-2016 10:55:53 by [REDACTED]. 
                                 <a href="javascript:toggle2('contentDivImg2_4', 'imageDivLink2_4');" id="imageDivLink2_4"><img src="images/minus.png" style="float:right;"></a> 
                               </div>
                                   <div class="plus_content" style="display: block;" id="contentDivImg2_4">
                               <div> 
                                            <textarea id="notes4" name="notes1" cols="" class="emailForm_input1" style="width: 97%; overflow: hidden; word-wrap: break-word; resize: horizontal; height: 237px;" readonly="readonly">Hello,
[REDACTED]
</textarea>
                                </div>
                                <br>
                            </div>
                                </div>
                </div>

1 个答案:

答案 0 :(得分:0)

使用带有xpath的or

.xpath("//div[@class='Notes' or @class='EMAIL SENT']")[0]

如果Notes首先出现,您将获得Notes,反之亦然。

如果我们更改了下面的html代码段,请向<div class="EMAIL SENT">in email添加一些文字,并将以后的代码类更改为<div class="Notes">in notes

我们可以看到使用lxml它是如何工作的:

In [13]: from lxml.etree import fromstring, HTMLParser

In [14]: xml = fromstring(html, HTMLParser())

In [15]: xml.xpath("//div[@class='Notes' or @class='EMAIL SENT']")
Out[15]: [<Element div at 0x7f96598d4ea8>, <Element div at 0x7f96598d4ef0>]

In [16]: xml.xpath("//div[@class='Notes' or @class='EMAIL SENT']")[0].text
Out[16]: 'in email\n                               '

In [17]: xml.xpath("//div[@class='Notes' or @class='EMAIL SENT']")[1].text
Out[17]: 'in notes\n    

因此,对于selenium,您只想通过xpath找到元素。