Question

我需要使用lxml.html查找代码的位置或全文。例如：

[some html code] </body > [some html code]

我需要返回：</body >或者本文的位置。

我该怎么做？以下代码不起作用。

page = fromstring(html)
for s in page.findall('.//body'):
    print s.tag, s.text, s.attrib

Answer 1

我在下面定义了一个python函数，它将搜索给定文件中的给定字符串，并在找到该字符串时打印行号和行内容。

def find_position(word, file):
    line_number = 0
    for line in open(file):
        line_number += 1
        if word in line:
            print "%d - %s" % (line_number, line)

这里word将字词作为字符串进行搜索，文件将字段的路径作为字符串。我在下面给出了例子。

find_position("body", "/home/user/page1.html")

<强>输出

24 - <body>
28 - </body>

Python找到HTML标记位置

1 个答案: