Question

我的代码输出多个空换行符。如何删除所有空白空间？

<div style="width:100%; height:100%">
    <span class="loading loader" id="loading" name="loading"></span>
    <webview class="ssologin" src="https://example.com/resources/ldap.php" autosize="on" style="min-width:755px; min-height:640px"></webview>
</div>

代码输出：

from bs4 import BeautifulSoup
import urllib.request
import re
url = input('enter url moish')
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page,'lxml')
all = soup.find_all('a', {'class' : re.compile('itemIncludes')})
for i in all:
          print(i.text)

所需的输出：

Canon EOS 77D DSLR Camera (Body Only)



LP-E17 Lithium-Ion Battery Pack



LC-E17 Charger for LP-E17 Battery Pack

谢谢！

Answer 1

您可以在打印之前删除空行：

items = [item.text for item in all if item.text.strip() != '']

Answer 2

您可以使用正则表达式来过滤输出，例如：

import re
text = i.text.strip()
if not re.search(r"^\s+$", text): # if not a bank line
    print(text)

注意：

这只是输出的解决方法，因为问题可能出在 find_all参数，我无法测试。

Answer 3

for i in all:
           items = ' '.join(i.text.split())
           print(items)

上面的代码删除了所有空白

Answer 4

我确定您已经解决了这个问题，但是我是python的新手，并且遇到了同样的问题。我也不想在打印时只删除行，我想在元素中更改它们，这是我的解决方案

soup = BeautifulSoup(getPage())
elements = soup.findAll()

for element in elements:
    text = element.text.strip()
    element.string = re.sub(r"[\n][\W]+[^\w]", "\n", text)

print(soup)

循环遍历元素，获取文本，替换“ \ n后跟空白，但没有其他>”的任何实例（一种找到空行的方式，但是可以使用更好的行！），设置替换值回到元素中。

使用Beautifulsoup抓取时如何删除多个空行

4 个答案: