我在Python中有以下代码(在PyCharm社区版中):
def defer_tags(sentence):
for letter in sentence:
print(letter)
if letter == '<':
end_tag = sentence.find('>')
sentence = sentence[end_tag+1:]
print(sentence)
defer_tags("<h1>Hello")
它产生了以下输出:
current letter = <
new_sentence = Hello
current letter = h
current letter = 1
current letter = >
current letter = H
current letter = e
current letter = l
current letter = l
current letter = o
为什么loop(letter
)导航整个字符串(sentence
),即使sentence
的值在循环内发生了变化?
我在更改后打印出sentence
的值,但它没有反映在循环迭代中。
答案 0 :(得分:0)
要明确,请尝试使用美丽的汤:
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('<h1>Hello<h1>')
>>> soup.text
u'Hello'
答案 1 :(得分:-1)
从标签中捕捉短语的更好方法就是使用re。
import re
def defer_tags(sentence):
return re.findall(r'>(.+)<', sentence)
defer_tags('<h1>Hello<h1>')
> ['Hello']
defer_tags('<h1>Hello</h1><h2>Ahoy</h2>')
> ['Hello', 'Ahoy']
如果标签已满,这将有效。即<h2>Hello</h2>
等<h1>Ahoy</h1> <h2>XX</h2>