我怎样才能得到父母的价值

时间:2018-04-13 23:16:39

标签: python html selenium selenium-webdriver xpath

<div class="island biz-owner-reply clearfix">

    <div class="biz-owner-reply-header arrange arrange--6">
        <div class="arrange_unit biz-owner-reply-photo">
            <div class="photo-box pb-30s">
                <a href="https://s3-media1.fl.yelpcdn.com/buphoto/QdBQ1FI9os4heZH9rFAV6Q/o.jpg">
                    <img alt="Beckie F." class="photo-box-img" height="30" src="https://s3-media4.fl.yelpcdn.com/buphoto/QdBQ1FI9os4heZH9rFAV6Q/30s.jpg" srcset="https://s3-media4.fl.yelpcdn.com/buphoto/QdBQ1FI9os4heZH9rFAV6Q/90s.jpg 3.00x,https://s3-media4.fl.yelpcdn.com/buphoto/QdBQ1FI9os4heZH9rFAV6Q/ss.jpg 1.33x" width="30">
                </a>
            </div>
        </div>
        <div class="arrange_unit arrange_unit--fill embossed-text-white">
            <strong>
                Comment from Beckie F. of Yard House
            </strong>
            <br>
            Business Customer Service
        </div>
    </div>
    <span class="bullet-after">4/4/2018</span>

    Hi Kim. We are happy to be apart of the community. Thank you for the warm welcome!

    <div class="review-footer clearfix"></div>
</div>

我正在尝试使用biz-owner-replyselenium获取课程python的价值。我首先找到该元素,然后尝试获取其值如下:

response = ""
responses = review_wrappers[0].find_elements_by_class_name("biz-owner-reply")
if len(responses) > 0:
    response = responses[0].text

但是,结果还包含其子元素的值:

'response':'Comment from Beckie F. of Yard House\nBusiness Customer Service\n4/4/2018 Hi Kim. We are happy to be apart of the community. Thank you for the warm welcome!'

我怎样才能得到:

Hi Kim. We are happy to be apart of the community. Thank you for the warm welcome!

2 个答案:

答案 0 :(得分:1)

因为selenium不能返回TextNode,只能返回ElementNode。我们需要javascript的帮助来使用HTML DOM API来存档您的目标。

script = """
    return Array.from(arguments[0].childNodes)
        .filter(function(node){return node.nodeType === 3;})
        .map(function(node){return node.nodeValue;})
        .join('');
"""
// childNodes get all child node of parent
// nodeType === 3, means it's a TextNode, like text inside html Tag
// nodeType === 1, means it's a ElementNode, like html tag
// nodetype === 2, means it's a AttributeNode, like attribute of html tag 

ele = driver.find_element_by_css_selector("div.biz-owner-reply");

txt = driver.execute_script(script, ele)

有关HTML DOM Node

的更多详情

有关HTML DOM NodeList

的更多详情

答案 1 :(得分:0)

似乎有点不清楚。雍和我的想法一样。到目前为止,您只需回忆您的消息的核心文本,您的答案包括访问者的所有回复。

例如,如果你的sql中只有3个表:

id,date,text

并且您想要像实际执行的那样只提取文本...您将获得所有文本。

如果您只想提取评论,我估计您需要:

带有#core_message

的sql或xml文件

answers = $ core_message

我需要更多信息,但这是仅仅调用单个元素而不是所有信息的想法......