<br/>
会换行。但是,如果我要使用
替换为空格或strip():几条地址线变为一行。
我该如何保留我仍然有一些地址行,如下面的预期输出所示?来自html的输入:
<span class="c2">1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),<br />Karachi - 75640<br />Pakistan</span><br />
我的代码如下:
if not (item.find('span', class_ = 'c2') is None):
address = item.find_all('span', class_ = 'c2')
for a in item.find_all('span', {"class":"c2"}):
for addr in address:
print('Before',addr)
if addr.find_all("br"):
for a in addr:
print('a',a)
if '<br/>' in a:
print('a loop',a)
我对班级(c2)的输出如下:
<span class="c2">1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),<br />Karachi - 75640<br />Pakistan</span><br />
测试在范围循环中的输出结果如下:
Before <span class="c2">1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),<br/>Karachi - 75640<br/>Pakistan</span>
a 1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),
a <br/>
a Karachi - 75640
a <br/>
a Pakistan
这会导致我当前的不良输出结果:
1233 / B,LAC II,St。37 / B,Mehmoodabad#6,在联合面包店后面,
Karachi-75640
巴基斯坦
预期的输出结果:
Mehmoodabad#6(位于联合面包店后面),LAC II,St。37 / B,1233 / B,
卡拉奇-75640
巴基斯坦
答案 0 :(得分:0)
您可以使用标记对象的replace_with()
方法:
from bs4 import BeautifulSoup
data = '''<span class="c2">1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),<br />Karachi - 75640<br />Pakistan</span><br />'''
soup = BeautifulSoup(data, 'lxml')
for br in soup.select('br'):
br.replace_with('\n')
print(soup.text.strip())
打印:
1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),
Karachi - 75640
Pakistan
答案 1 :(得分:0)
您可以使用剥离的字符串并加入
from bs4 import BeautifulSoup as bs
html = '''
<span class="c2">1233/B, LAC II, St. 37/B, Mehmoodabad # 6, (Behind United Bakery),<br />Karachi - 75640<br />Pakistan</span><br />
'''
soup = bs(html, 'lxml')
for item in soup.select('.c2'):
strings = '\n'.join([string for string in item.stripped_strings])
print(strings)