我怎样才能使用BeautifulSoup来包装一个单词?

时间:2017-05-13 16:57:59

标签: python beautifulsoup

我需要使用BeautifulSoup将单词列表转换为span。

例如

<html><body>word-one word-two word-one</body></html>

需要

<html><body><span>word-one</span> word-two <span>word-one</span></body></html>

其中word-one需要移入范围

到目前为止,我可以使用以下方法找到这些元素:

for html_element in soup(text=re.compile('word-one')):
    print(html_element)

然而,将这些文本替换为跨度并不清楚。

1 个答案:

答案 0 :(得分:3)

我做过类似的事情,其中​​变量html是你的代码<html><body>word-one word-two word-one</body></html>,我将文本和代码分开,然后将它们一起添加。

soup = BeautifulSoup(html,'html.parser')
text = soup.text # Only the text from the soup

soup.body.clear() #Clear the text between the body tags

new_text = text.split() # Split beacuse of the spaces much easier

for i in new_text:
    new_tag = soup.new_tag('span') #Create a new tag
    new_tag.append(i) #Append i to it (from the list that's split between spaces)
    #example new_tag('a') when we append 'word' to it it will look like <a>word</a>
    soup.body.append(new_tag) #Append the whole tag e.g. <span>one-word</span)

我们也可以使用正则表达式来匹配某些单词。

soup = BeautifulSoup(html, 'html.parser')
text = soup.text  # Only the text from the soup

soup.body.clear()  # Clear the text between the body tags

theword = re.search(r'\w+', text)  # Match any word in text
begining, end = theword.start(), theword.end()

soup.body.append(text[:begining])  # We add the text before the match

new_tag = soup.new_tag('span')  # Create a new tag

new_tag.append(text[begining:end])
# We add the word that we matched in between the new tag
soup.body.append(new_tag)  # We append the whole text including the tag
soup.body.append(text[end:])  # Append everything that's left

我确信我们可以以类似的方式使用.insert