Question

我在txt文件中有一些数据，我试图在这个文件中找到一些特定的单词。

import re
from bs4 import BeautifulSoup


with open ("myfile.txt") as f:
    soup = BeautifulSoup(f)

    print (soup.find_all("DLC"))

文件中至少有5 DLC，但输出是空列表。我将soup = BeautifulSoup(f)更改为soup = BeautifulSoup(f),"html.parser"，但没有成功。为什么它返回一个空列表，因为我知道字符串已经存在于文件中。它也没有在我提取这些数据的网站上工作。我该如何解决这个问题？

评论后编辑：例如;

<h1>Fallout 4'ün Far Harbor DLC fragmanı yayımlandı!</h1>
<h2>Bethesda'nın yaptığı en geniş DLC geliyor</h2>

Answer 1

当您致电soup.find_all("DLC")时，BeautifulSoup 会在页面上查找DLC标记/元素，而不是文本DLC元件。

相反，您打算使用text参数（在现代BeautifulSoup the argument is called string而不是text）：

soup.find_all(text=lambda text: text and "DLC" in text)

BeautifulSoup4无缘无故地返回一个空列表

1 个答案: