Question

环境：python3 + LXML
。这是test.html。

{isAuthorised ? ([
  <NavItem key={0} eventKey={5}> ...
]) : (

提取test.html中所有节点中的所有文本。

public Student getTopStudent() //this is the other method I need to create
{
double x= students[0].getAverageScore();
int y = 0;
for(int i=1;i<students.length;i++){
    if(x<students[i].getAverageScore()) {
        x = students[i].getAverageScore();
        y =i;
    }   
}
return students[y];
}

使用xpath表达式 tree.xpath（'// * / text（）'），所有文本都属于html标记输出，如下所示。

<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document</title>
</head>
<body>
    <p>it is a test for xpath</p>
    <a>it is a test for xpath</a>
    <script>$(function(){
    $.ajax({
            url: "/account/getUserInfo",
            async:false,
            dataType:"json",
            success: function(data) {  }
            }
    })
});</script>
</body>
</html>

使用xpath texts = tree.xpath（'// script / text（）'）提取脚本文本。现在我想提取除脚本之外的所有节点文本。

import lxml.html
tree = lxml.html.parse("test.html")
texts = tree.xpath('//*/text()')
for text in texts:
    print(text)

他们都不能做，如何解决？

如何在python3'lxml.html中选择不是xpath脚本的节点文本？

0 个答案: