使用CasperJS提取文本列表

时间:2016-04-06 07:13:27

标签: javascript web-scraping casperjs text-extraction html-content-extraction

我想从此列表中提取文本值:

<ul class="standardSuggestions">
    <li class="">

        <div id="idac">
            <span class="email" id="idb7"><span>mail-fuer-chrisko</span>@<span>web.de</span></span>
            <span class="btn-positioner"><span class="btn-wrapper btn-fix btn-service btn-xs"><input name="wishnamePanel:suggestionsContainerWrapper:freeMailSuggestionsPanel:standard-suggestion-list:suggestionRepeaterContainer:suggestion-to-repeat:1:suggestion:subForm:select-email" id="idae" value="Übernehmen" type="submit"></span></span>
        </div>

    </li><li class="">

        <div id="idaf">
            <span class="email" id="idb8"><span>post-fuer-chrisko</span>@<span>web.de</span></span>
            <span class="btn-positioner"><span class="btn-wrapper btn-fix btn-service btn-xs"><input name="wishnamePanel:suggestionsContainerWrapper:freeMailSuggestionsPanel:standard-suggestion-list:suggestionRepeaterContainer:suggestion-to-repeat:2:suggestion:subForm:select-email" id="idb0" value="Übernehmen" type="submit"></span></span>
        </div>

    </li><li class="">

        <div id="idb1">
            <span class="email" id="idb9"><span>chrisko1</span>@<span>web.de</span></span>
            <span class="btn-positioner"><span class="btn-wrapper btn-fix btn-service btn-xs"><input name="wishnamePanel:suggestionsContainerWrapper:freeMailSuggestionsPanel:standard-suggestion-list:suggestionRepeaterContainer:suggestion-to-repeat:3:suggestion:subForm:select-email" id="idb2" value="Übernehmen" type="submit"></span></span>
        </div>

    </li><li class="">

        <div id="idb3">
            <span class="email" id="idba"><span>chrisko.1</span>@<span>web.de</span></span>
            <span class="btn-positioner"><span class="btn-wrapper btn-fix btn-service btn-xs"><input name="wishnamePanel:suggestionsContainerWrapper:freeMailSuggestionsPanel:standard-suggestion-list:suggestionRepeaterContainer:suggestion-to-repeat:4:suggestion:subForm:select-email" id="idb4" value="Übernehmen" type="submit"></span></span>
        </div>

    </li>
</ul>

问题是每次重新加载时div id =“”都在变化。所以我不确定如何选择正确的元素。我尝试使用以下功能:

casper.then(function(){
    var listItems = this.evaluate(function () {
        var nodes = document.querySelectorAll('ul > li');
        return [].map.call(nodes, function(node) {
            return {
                text: node.querySelector("span").textContent
            };
        });
    });
    this.echo(JSON.stringify(listItems, undefined, 4)); 
});

echo是“null”: - (

1 个答案:

答案 0 :(得分:1)

您对元素的迭代是正确的。从页面上下文中获取null值的唯一方法是出现错误。可能产生错误的代码的唯一部分是node.querySelector("span").textContent,因为node不一定必须有<span>后代。如果它没有,那么这会因TypeError而失败,你得到null

您显示的有限标记在每个<span>中始终包含<li>,因此页面上必须有另一个<ul>,但没有<span> } 后人。您必须找到一个CSS选择器,其中不包含另一个<ul>元素。

我建议

var nodes = document.querySelectorAll('ul.standardSuggestions > li');