我有两个输入文件:一个html和一个css。我想根据css文件的内容在html文件上产生一些操作。
我的HTML是这样的:
<html>
<head>
<title></title>
</head>
<body>
<p class = "cl1" id = "id1"> <span id = "span1"> blabla</span> </p>
<p class = "cl2" id = "id2"> <span id = "span2"> blablabla</span> <span id = "span3"> qwqwqw </span> </p>
</body>
</html>
span id的样式在css文件中定义(单独为每个span id!)
在做真实的东西(根据他们的风格删除跨度)之前,我试图从html打印出id,并从每个id对应的css中打印出样式descritption。
代码:
from lxml import etree
tree = etree.parse("file.html")
filein = "file.css"
def f1():
with open(filein, 'rU') as f:
for span in tree.iterfind('//span'):
for line in f:
if span and span.attrib.has_key('id'):
x = span.get('id')
if "af" not in x and x in line:
print x, line
def main():
f1()
所以,有两个for循环,如果分开则迭代完美,但是当在这个函数中放在一起时,迭代在第一个循环之后停止:
>> span1 span`#span1 { font-weight: bold; font-size: 11.0pt; font-style: normal; letter-spacing: 0em }
我该如何解决这个问题?
答案 0 :(得分:1)
之所以发生这种情况,是因为您已经读取了所有文件,直到第二个外循环开始。 要使其工作,您需要在启动内部循环文件之前添加f.seek(0):
with open(filein, 'rU') as f:
for span in tree.iterfind('//span'):
f.seek(0)
for line in f:
if span and span.attrib.has_key('id'):
x = span.get('id')
if "af" not in x and x in line:
print x, line
答案 1 :(得分:1)
如果我认为,树已完全加载到内存中,您可以尝试反转循环。这样,您只需浏览文件filein
一次:
def f1():
with open(filein, 'rU') as f:
for line in f:
for span in tree.iterfind('//span'):
if span and span.attrib.has_key('id'):
x = span.get('id')
if "af" not in x and x in line:
print x, line