<html>
<body>
<p>A <span>die</span> is thrown \(x = {-b \pm\sqrt{b^2-4ac} \over 2a}\) twice. What is the probability of getting a sum 7 fromboth the throws?</p>
<p> Test </p>
</body>
<html>
我正在尝试将\(x = {-b \pm\sqrt{b^2-4ac} \over 2a}\)
包含在span标记内。当is thrown \(x = {-b \pm\sqrt{b^2-4ac} \over 2a}\) twice. What is the probability of getting a sum 7 from both the throws?
是单个NavigableString时我能够这样做,但在某些情况下is thrown \(x = {-b \pm\
,sqrt{b^2-4ac}
和\over 2a}\) twice. What is the probability of getting a sum 7 from both the throws?
被分成三个NavigableString。那么有没有办法使用beautifulsoup将连续的NavigableString合并到一个NavigableString。
当(x = {-b \ pm \ sqrt {b ^ 2-4ac} \ over 2a})`没有一个NavigableString时,我用来将它们包装在span标签内的代码。
mathml_regex = re.compile(r'\\\(.*?\\\)', re.DOTALL)
def mathml_wrap(soup):
for p_tags in soup.find_all('p'):
for p_child in p_tags.children:
try:
match = re.search(mathml_regex, p_child)
if match:
start = match.start()
end = match.end()
text = p_child
new_str = NavigableString(text[:start])
p_child.replace_with(new_str)
new_str1 = NavigableString(text[end:])
span_tag = soup.new_tag("span", **{'class':'math-tex'})
span_tag.string= text[start:end]
new_str.insert_after(span_tag)
span_tag.insert_after(new_str1)
except TypeError:
pass
编辑:
from bs4 import BeautifulSoup
import re
html = """<p>
A
<span>die</span>
is thrown \(x = {-b \pm
<span>\sqrt</span>
{b^2-4ac} \over 2a}\) twice. What is the probability of getting a sum 7 from
both the throws?
</p> <p> Test </p>"""
soup = BeautifulSoup(html, 'html.parser')
mathml_start_regex = re.compile(r'\\\(')
mathml_end_regex = re.compile(r'\\\)')
for p_tags in soup.find_all('p'):
match = 0 #Flag set to 1 if '\(' is found and again set back to 0 if '\)' is found.
for p_child in p_tags.children:
try: #Captures Tags that contains \(
if re.findall(mathml_start_regex, p_child.text):
match += 1
except: #Captures NavigableString that contains \(
if re.findall(mathml_start_regex, p_child):
match += 1
try: #Replaces Tag with Tag's text
if match == 1:
p_child.replace_with(p_child.text)
except: #No point in replacing NavigableString since they are just strings without Tags
pass
try: #Captures Tags that contains \)
if re.findall(mathml_end_regex, p_child.text):
match = 0
except: #Captures NavigableString that contains \)
if re.findall(mathml_end_regex, p_child):
match = 0
使用上面的代码处理我的汤后,删除\(
和\)
之间的范围标记
is thrown \(x = {-b \pm\
,sqrt
和{b^2-4ac} \over 2a}\) twice. What is the probability of getting a sum 7 from both the throws?
在我的汤对象中分为3个NavigableStrings。
答案 0 :(得分:0)
我不知道我是否正确地提出了您的问题,但正如您所说,您想要连接这些<p>
标签中的字符串,
我用它作为输入 -
mystr = """<html>
<body>
<p>A <span>die</span> is thrown \(x = {-b \pm\sqrt{b^2-4ac} \over 2a}\) twice. What is the probability of getting a sum 7 fromboth the throws?</p>
<p> Test </p>
</body>
<html>"""
所以这就是我所做的 -
soup = BeautifulSoup(mystr,"lxml")
my_p = soup.findAll("p")
for p in my_p:
print p.text
这会在<p>
标记中提取您收到的全文,告诉我您的问题是否是其他内容。