如何将网站的html页面写入CSV文件?

时间:2020-09-04 22:50:49

标签: python html csv web

当我尝试在my_html.html中编写网页的html时,会弹出此错误。请指导主要如何成功编写它。

错误: 文件“ C:\ Users \ DRB \ AppData \ Local \ Programs \ Python \ Python38-32 \ lib \ encodings \ cp1252.py”,第19行,采用编码 返回codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError:'charmap'编解码器无法对位置84032上的字符'\ u21e3'进行编码:字符映射到

from( "../ProscTomcatAdmin/src/META-INF/MANIFEST.MF")
into( "META-INF" )

1 个答案:

答案 0 :(得分:1)

尝试以二进制模式打开页面并保存.content而不是.text的响应:

import requests

def url_to_file(url, fname="web_txt.html"):
    response = requests.get(url)
    html_content = response.content         # <-- use .content
    if response.status_code == 200:
        with open(fname, "wb") as r:        # <-- open file in binary mode
            r.write(html_content)

        return html_content.decode('utf-8', 'ignore')   # <-- decode content as utf-8

    return "Failed to perform its task."

url = "https://www.geeksforgeeks.org/absolute-relative-pathnames-unix/"
print(url_to_file(url))

打印:

<!DOCTYPE html>
<!--[if IE 7]>
<html class="ie ie7" lang="en-US" prefix="og: http://ogp.me/ns#">
<![endif]-->

...<!DOCTYPE html>
<!--[if IE 7]>
<html class="ie ie7" lang="en-US" prefix="og: http://ogp.me/ns#">
<![endif]-->

...

并保存web_txt.html