我正在使用BeautifulSoup在特定页面上查找用户输入的单词,并突出显示所有这些单词。例如,我想突出显示所有单词' Finance'位于页面上 ' https://support.google.com/finance/?hl=en&ei=VC8QVaH0N-acwgP36IG4AQ'
#!/usr/bin/python
# charset=utf-8
import urllib2
import re
from bs4 import BeautifulSoup
html = urllib2.urlopen('https://support.google.com/finance/?hl=en&ei=VC8QVaH0N-acwgP36IG4AQ').read()
soup = BeautifulSoup(html)
matches = soup.body(text='Finance')
for match in matches:
match.wrap(soup.new_tag('span', style="background-color:#FE00FE"))
print soup
答案 0 :(得分:0)
我发现这个正则表达式的变体用于单词突出显示。但结果文档包含破解的javascript
import urllib2
import re
from bs4 import BeautifulSoup
html = urllib2.urlopen('https://support.google.com/finance/?hl=en&ei=VC8QVaH0N-acwgP36IG4AQ').read()
soup = BeautifulSoup(html)
for text in soup.body.findAll(text=True):
if re.search(r'inance\b',text):
new_html = "<p>"+re.sub(r'(\w*)inance\b', r'<span style="background-color:#FF00FF">\1inance</span>', text)+"</p>"
new_soup = BeautifulSoup(new_html)
text.parent.replace_with(new_soup.p)
print soup