Question

我在我的python脚本中使用了一个选择器来从下面给出的一些html元素中获取文本。我尝试使用.text从元素中获取Shop here cheap字符串，但它根本不起作用。但是，当我尝试使用.text_content()时，它可以正常工作。

我的问题是：

.text方法出了什么问题？为什么它不能解析元素中的文本？

Html元素：

<div class="Price__container">
    <span class="ProductPrice" itemprop="price">$6.35</span>
    <span class="ProductPrice_original">$6.70</span>
    Shop here cheap
</div>

我尝试过：

from lxml import html

tree = html.fromstring(element)
for data in tree.cssselect(".Price__container"):      
    print(data.text)           #It doesn't work at all

顺便说一句，我不想继续使用.text_content()这就是为什么我期待使用.text来删除文本的任何答案。提前谢谢。

Answer 1

我认为混淆的根本原因是FolderDao有.text&.tail concept代表节点内容，避免必须有一个特殊的“文本”节点实体，引用{{3} }：

两个属性.text和.tail足以表示XML文档中的任何文本内容。这样，除了Element类之外，ElementTree API不需要任何特殊的文本节点，这些节点往往会相当频繁（正如您可能从经典DOM API中了解到的那样）。

在您的情况下，lxml是Shop here cheap元素的尾部，因此不包含在父节点的<span class="ProductPrice_original">$6.70</span>值中。

除了.text之外的其他方法，您可以通过非递归获取所有顶级文本节点来达到目的：

.text_content()

或者，获取最后一个顶级文本节点：

print(''.join(data.xpath("./text()")).strip())

Answer 2

另一种方法可能就像打击一样：

 if(Line==true && Lines.Count>1)
        {
                                if (Lines[0].Start.X == e.Location.X || 
Lines[0].Start.Y==e.Location.Y)
                {
                snapON = true;
                    snapPoint = Lines[0].Start;
                panel1.Invalidate();
                }
                else if(Lines[0].End.X==e.Location.X || Lines[0].End.Y == e.Location.Y)
                {
                snapON = true;
                    snapPoint = Lines[0].End;
                panel1.Invalidate();
                }
            }

输出：

content="""
<div class="Price__container">
    <span class="ProductPrice" itemprop="price">$6.35</span>
    <span class="ProductPrice_original">$6.70</span>
    Shop here cheap
</div>
"""
from lxml import html

tree = html.fromstring(content)
for data in tree.cssselect(".Price__container"):
    for item in data:item.drop_tree()
    print(data.text.strip())

刮板给空白输出

2 个答案: