我有这样的字符串:
<h2 class="debateHeaderProp">This house believes that society benefits when we share personal information online.</h2>
在&#34;&lt;&#34;之间删除任何内容的最佳方法是什么?和&#34;&gt;&#34;只有离开&#34;这家酒店认为,当我们在线分享个人信息时,社会会受益?
答案 0 :(得分:0)
这是一种方式(不确定它是否是&#34;最好的&#34;)
>>> from xml.etree.ElementTree import XML
>>> s = '<h2 class="debateHeaderProp">This house believes that society benefits when we share personal information online.</h2>'
>>> x = XML(s)
>>> x.text
'This house believes that society benefits when we share personal information online.'
>>>
答案 1 :(得分:0)
中阅读更多内容XML是一种固有的分层数据格式,表示它的最自然的方式是使用树。 ET有两个类用于此目的 - ElementTree将整个XML文档表示为树,Element表示此树中的单个节点。与整个文档的交互(读取和写入文件)通常在ElementTree级别上完成。与单个XML元素及其子元素的交互在元素级别完成。
你也可以使用正则表达式:
>>> import re
>>> re.search(r'(?<=>).*(?=<)' ,s).group(0)
'This house believes that society benefits when we share personal information online.'
答案 2 :(得分:0)
只有一行标记,使用专用解析器有点矫枉过正。但是,对于较大的数据集,使用BeautifulSoup
之类的解析器是可行的方法。请参阅下面的示例。
from bs4 import BeautifulSoup as bsoup
import re
markup = """
<h2 class="debateHeaderProp">This house believes that society benefits when we share personal information online.</h2>
<span class="debateFormat">Oregon-Oxford, Cross Examination</span>
<div class="debateAffirmSide">On the affirmative: Foo Debate Club</div>
<div class="debateOpposeSide">On the opposition: Bar Debate Club</div>
"""
soup = bsoup(markup)
# Explicitly define the tag and class.
motion = soup.find("h2", class_="debateHeaderProp").get_text()
# Or just use the class.
d_format = soup.find(class_="debateFormat").get_text()
# And even use regex for more power.
teams = [t.get_text() for t in soup.find_all("div", class_=re.compile(r".*debate.*Side.*"))]
print "Our Debate for Today"
print "Motion:", motion
print "Format:", d_format
print teams[0]
print teams[1]
# Prints the following:
# Our Debate for Today
# Motion: This house believes that society benefits when we share personal information online.
# Format: Oregon-Oxford, Cross Examination
# On the affirmative: Foo Debate Club
# On the opposition: Bar Debate Club
另一个选择是使用类似于lxml
的XML解析器。