Question

好的，我的编码很生疏，所以我一直在借用和改编教程。

我开始玩BeautifulSoup打开一个文件：

with open('event.html', encoding='utf8') as f:
    soup = bs4.BeautifulSoup(f, "lxml")

后来，我需要在同一个文件中找到一个字符串，BS似乎更复杂，所以我做了：

lines = f.readlines()

并将其与之前的说明结合在一起：

with open('event.html', encoding='utf8') as f:
    soup = bs4.BeautifulSoup(f, "lxml")
    lines = f.readlines()

我感到困惑的是，如果我交换两行并按下面那样制作该块：

with open('event.html', encoding='utf8') as f:
    lines = f.readlines()
    soup = bs4.BeautifulSoup(f, "lxml")

然后我的其余代码将会中断。为什么？

Answer 1

readlines函数使内部文件指针指向文件的末尾。我自己没有使用过BeautifulSoup，但我认为他们假设输入文件指向文件中的第0个索引。使用f.seek(0)将文件搜索到开头可以减轻这种情况。

with open('event.html', encoding='utf8') as f:
    lines = f.readlines()
    f.seek(0)
    soup = bs4.BeautifulSoup(f, "lxml")

BeautifulSoup可能正在读取文件，然后将文件指针设置回读完后的位置，这就是为什么它正在反过来。