Question

我有两个输入文件：一个html和一个css。我想根据css文件的内容在html文件上产生一些操作。

我的HTML是这样的：

<html>
 <head>
        <title></title>
    </head>
    <body>
    <p class = "cl1" id = "id1"> <span id = "span1"> blabla</span> </p>
    <p class = "cl2" id = "id2"> <span id = "span2"> blablabla</span> <span id = "span3"> qwqwqw </span> </p>
    </body>
    </html>

span id的样式在css文件中定义（单独为每个span id！）

在做真实的东西（根据他们的风格删除跨度）之前，我试图从html打印出id，并从每个id对应的css中打印出样式descritption。

代码：

from lxml import etree

tree = etree.parse("file.html")

filein = "file.css"


def f1():

    with open(filein, 'rU') as f:   
        for span in tree.iterfind('//span'):   
            for line in f:
                if span and span.attrib.has_key('id'):
                    x = span.get('id')
                    if "af" not in x and x in line:
                            print x, line
def main():
     f1()

所以，有两个for循环，如果分开则迭代完美，但是当在这个函数中放在一起时，迭代在第一个循环之后停止：

>> span1 span`#span1 { font-weight: bold; font-size: 11.0pt; font-style: normal; letter-spacing: 0em }

我该如何解决这个问题？

Answer 1

之所以发生这种情况，是因为您已经读取了所有文件，直到第二个外循环开始。要使其工作，您需要在启动内部循环文件之前添加f.seek（0）：

with open(filein, 'rU') as f:   
    for span in tree.iterfind('//span'):
        f.seek(0)   
        for line in f:
            if span and span.attrib.has_key('id'):
                x = span.get('id')
                if "af" not in x and x in line:
                        print x, line

Answer 2

如果我认为，树已完全加载到内存中，您可以尝试反转循环。这样，您只需浏览文件filein一次：

def f1():

    with open(filein, 'rU') as f:   
        for line in f:
            for span in tree.iterfind('//span'):   
                if span and span.attrib.has_key('id'):
                    x = span.get('id')
                    if "af" not in x and x in line:
                            print x, line

嵌套的for循环迭代停止

2 个答案: