Xpath选择带有多个空格和换行符的html

时间:2015-08-05 17:03:38

标签: html xml xpath

我正在尝试选择div class属性,其中包含多个空格和新行。这是下面的一个片段。我想选择所有与test-onetopit相似的div:

<div class="test-one
                    topit
        ">


        <div class='test-one a'>1
        </div>
        <div class='topit'>2
        </div>
</div>

<div class="test-one
                    topit
        ">


        <div class='test-one a'>1
        </div>
        <div class='topit'>2
        </div>
</div>

以下是我的尝试:

"//div[contains(concat(' ', normalize-space(@class), ' '), ' topranks ') and contains(concat(' ', normalize-space(@class), ' ), ' list-node ')]"

//*[contains(concat(' ', normalize-space(@class), ' '), ' atag ')]

我试图改进的来源:

XPath - How to select by @text that contains new line

How can I match on an attribute that contains a certain string?

1 个答案:

答案 0 :(得分:1)

cssselect

cssselect.GenericTranslator().css_to_xpath('div.test-one.topit')
# "descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' test-one ') and (@class and contains(concat(' ', normalize-space(@class), ' '), ' topit '))]"
tree = lxml.html.parse('http://www.made-in-china.com/companysearch.do?xcase=hunt&order=0&style=b&page=1&word=bag&size=30&sizeHasChanged=0&memberLevel=blank&sgsMembershipFlag=&comProvince=nolimit&comCity=&cateCode=&comBusinessType=blank&numEmployees=&annualRevenue=&code=0&managementCertification=').getroot()

tree.cssselect('div.list-node.topranks')
# [<Element div at 0x7f62e732dd18>, <Element div at 0x7f62e72d1f48>, <Element div at 0x7f62e72eb188>, <Element div at 0x7f62e72eb0e8>, <Element div at 0x7f62e72eb138>, <Element div at 0x7f62e72eb1d8>, <Element div at 0x7f62e72eb228>, <Element div at 0x7f62e72eb278>, <Element div at 0x7f62e72eb2c8>, <Element div at 0x7f62e72eb318>]