Question

我正在从《今日美国报》上收集日期，标题和内容。我可以获取日期，标题甚至内容，但是随着内容的获取，我得到了一些不需要的东西。我不知道应该更改代码以仅获取内容（文章）吗？

with open("myfile.txt", encoding="utf-8", errors="surrogateescape") as f:
    for line in f:                     # ok utf8 has been decoded here
        line = line.translate(tab)     # and cp1252 bytes are recovered here

我希望每篇文章都有日期，标题和内容。

Answer 1

我尝试通过

查找内容

contentTag = sauce.find_all('p',{"class": "p-text"})

，内容条件为

if isinstance(contentTag,list):
    content = []
    for c in contentTag:
        content.append(c.get_text().strip())
    content = ' '.join(content)

有效。

如何使用Python 3.7中的Beautifulsoup从《今日美国》报纸上的文章中收集内容？

1 个答案: