Question

是否有一个衬垫，我可以从汤对象中获取文本，然后使用分割线获取html中每行的列表。然后删除列表中只有换行符的所有多余空行。

我不想写另一个for循环通过两次并清理新行。此外，赞赏任何其他pythonic方式。

soup = BeautifulSoup('myhtml.html', 'html.parser')
sections = soup.findAll(div, class_='section')
lines = []
for section in sections:
    lines = lines + section.get_text().splitlines()

Answer 1

尝试列表理解：

lines = lines + [l for l in sections.get_text().splitlines() if l]

或者，filter：

lines = lines + list(filter(None, sections.get_text().splitlines()))

此外，您可以将此缩短为

lines += ...

如果你想摆脱循环，这就是你所做的：

lines = [l for section in soup.findAll(div, class_='section')\
              for l in section.get_text().splitlines() if l]

Answer 2

这是真正的单行：）

from itertools import chain
lines = list(chain.from_iterable([l for l in section.get_text().splitlines() if l] 
                   for section in soup.findAll(div, class_='section')))

BeautifulSoup，get_text（），splitlines（），如何删除pythonic one liner中的空行？

2 个答案: