如何添加间距?
现在,输出已聚类,我想在段落之间加上空格。
我已经看到其他人使用get_text
分隔符,但我没有使用它。
from urllib.request import urlopen
from bs4 import BeautifulSoup
# specify the url
url = "https://www.bbc.com/sport/football/50944416"
# Connect to the website and return the html to the variable ‘page’
try:
page = urlopen(url)
except:
print("Error opening the URL")
# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, 'html.parser')
# Take out the <div> of name and get its value
content = soup.find('div', {"class": "story-body sp-story-body gel-body-copy"})
article = ''
for i in content.findAll('p'):
article = article + ' ' + i.text
print(article)
答案 0 :(得分:1)
您可以从标准库中textwrap
指定每行的长度,并在每段p
from urllib.request import urlopen
from bs4 import BeautifulSoup
import textwrap
article = ''
line_size = 75
for i in content.findAll('p'):
w = textwrap.TextWrapper(width=line_size,break_long_words=False,replace_whitespace=False)
body = '\n'.join(w.wrap(i.text))
article += body+"\n\n"
print(article)
您可以使用循环来完成此操作,但是我建议更好地使用textwrap
,因为它可以处理断行,而且简单得多,无论如何,手动执行的基本方法如下:
article = ''
for i in content.findAll('p'):
text = i.text.strip()
for n in range(len(text)):
if n % line_size != 0 and i!=0:
article += text[n]
else:
article += "\n" + text[n]
article+="\n\n"
print(article)
答案 1 :(得分:0)
string.strip()命令可能会帮助您
article = ''
for i in content.findAll('p'):
article = article + '\t' + i.text.strip() + '\n'
print(article)