我正在尝试删除之间的所有内容,如果之间是66号:
我收到以下错误:TypeError:类型'NoneType'的参数不可迭代...如果element.text中的element.tag =='answer'和'-66':
这有什么问题?有帮助吗?
#!/usr/local/bin/python2.7
# -*- coding: UTF-8 -*-
from lxml import etree
planhtmlclear_utf=u"""
<questionaire>
<question>
<questiontext>What's up?</questiontext>
<answer></answer>
</question>
<question>
<questiontext>Cool?</questiontext>
<answer>-66</answer>
</question>
</questionaire>
"""
html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
for element in question.getchildren():
if element.tag == 'answer' and '-66' in element.text:
html.xpath('/questionaire')[0].remove(question)
print etree.tostring(html)
答案 0 :(得分:1)
element.text似乎是None。错误是说它无法通过None查找“-66”,所以检查element.text不是None,首先是这样的:
html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
for element in question.getchildren():
if element.tag == 'answer' and element.text and '-66' in element.text:
html.xpath('/questionaire')[0].remove(question)
print etree.tostring(html)
在xml中失败的行是<answer></answer>
,标签之间没有文字。
修改(关于组合标记的问题的第二部分):
您可以像这样使用BeautifulSoup
:
from lxml import etree
import BeautifulSoup
planhtmlclear_utf=u"""
<questionaire>
<question>
<questiontext>What's up?</questiontext>
<answer></answer>
</question>
<question>
<questiontext>Cool?</questiontext>
<answer>-66</answer>
</question>
</questionaire>"""
html = etree.fromstring(planhtmlclear_utf)
questions = html.xpath('/questionaire/question')
for question in questions:
for element in question.getchildren():
if element.tag == 'answer' and element.text and '-66' in element.text:
html.xpath('/questionaire')[0].remove(question)
soup = BeautifulSoup.BeautifulStoneSoup(etree.tostring(html))
print soup.prettify()
打印:
<questionaire>
<question>
<questiontext>
What's up?
</questiontext>
<answer>
</answer>
</question>
</questionaire>
以下是您可以下载BeautifulSoup module。
的链接或者,以更紧凑的方式做到这一点:
from lxml import etree
import BeautifulSoup
# abbreviating to reduce answer length...
planhtmlclear_utf=u"<questionaire>.........</questionaire>"
html = etree.fromstring(planhtmlclear_utf)
[question.getparent().remove(question) for question in html.xpath('/questionaire/question[answer/text()="-66"]')]
print BeautifulSoup.BeautifulStoneSoup(etree.tostring(html)).prettify()
答案 1 :(得分:1)
检查element.text
是否为None
的替代方法是优化您的XPath:
questions = html.xpath('/questionaire/question[answer/text()="-66"]')
for question in questions:
question.getparent().remove(question)
括号[...]
表示“这样”。所以
question # find all question elements
[ # such that
answer # it has an answer subelement
/text() # whose text
= # equals
"-66" # "-66"
]