Python - 如何逐行阅读HTML

时间:2015-12-15 09:56:49

标签: python html

我正在尝试编写一个程序,它将获取一个HTML文件并输出每一行。我做错了,因为我的代码输出了每个字母。如何将所有HTML行添加到列表中?

这是到目前为止的代码:

f = open("/home/tony/Downloads/page1/test.html", "r")
htmltext = f.read()
f.close()

for t in htmltext:
    print t + "\n"

2 个答案:

答案 0 :(得分:2)

您可以使用f.readlines()代替f.read()。此函数返回文件中所有行的列表。

with open("/home/tony/Downloads/page1/test.html", "r") as f:
    for line in f.readlines():
        print(line)

或者您可以使用list(f)

f = open("/home/tony/Downloads/page1/test.html", "r")
f_lines = list(f)
for line in f_lines:
    print(line)

来源:https://docs.python.org/3.5/tutorial/inputoutput.html

答案 1 :(得分:1)

f.read()将尝试阅读并产生每个角色,直到满足EOF。你想要的是f.readlines()方法:

with open("/home/tony/Downloads/page1/test.html", "r") as f:
    for line in f.readlines():
        print(line) # The newline is included in line