Question

我想在Python 3.4.3中读取一个HTML文件。

我试过了：

import urllib.request
fname = r"C:\Python34\html.htm"
HtmlFile = open(fname,'w')
print (HtmlFile)

打印：

<_io.TextIOWrapper name='C:\\Python34\\html.htm' mode='w' encoding='cp1252'>

我想获取HTML源代码，以便我可以用漂亮的汤来解析它。

Answer 1

您必须阅读该文件的内容。

HtmlFile = open(fname, 'r', encoding='utf-8')
source_code = HtmlFile.read()

Answer 2

我正在尝试读取文件夹中保存的HTML文件。我尝试了Vikasa提到的代码，但出现错误。因此，我更改了代码，并尝试再次阅读它对我有用。代码如下：

    fname = 'page_source.html' #this html file is stored on the same folder of the code file
    html_file = open(fname, 'r')
    source_code = html_file.read()

使用

打印html页面

source_code

它将打印从page_source.html文件读取的内容。

在Python中从文件夹中读取HTML文件

2 个答案: