Question

我正在尝试从网页中获取文字，因此产生了“回溯”（最近一次通话最近）：文件“ C：\ Users \ username \ Desktop \ Python \ parsing.py”，第21行，在 textFile.write（str（结果）） UnicodeEncodeError：'cp949'编解码器无法在位置37971上编码字符'\ xa9'：非法的多字节序列'

我已经搜索并尝试过 textFile.write（str（results）.decode（'utf-8'））并且不会造成属性错误。

import requests
import os
from bs4 import BeautifulSoup

outputFolderName = "output"

currentPath = os.path.dirname(os.path.realpath(__file__))
outputDir = currentPath + "/" +outputFolderName

r = requests.get('https://yahoo.com/')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.findAll(text=True)

try :
    os.mkdir(outputDir)
    print("output directory generated")
except :
    print("using existing directory")

textFile = open(outputDir + '/output.txt', 'w')
textFile.write(str(results))
textFile.close()

有什么方法可以转换str（results）的编解码器并正确保存吗？

python版本是3.7.3

Answer 1

请像本例一样指定编码

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import os
from bs4 import BeautifulSoup

outputFolderName = "output"

currentPath = os.path.dirname(os.path.realpath(__file__))
outputDir = currentPath + "/" +outputFolderName

r = requests.get('https://yahoo.com')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.findAll(text=True)

try :
    os.mkdir(outputDir)
    print("output directory generated")
except :
    print("using existing directory")

textFile = open(outputDir + '/output.txt', mode='w', encoding='utf8')
textFile.write(str(results))
textFile.close()

保存文件时python获取unicode编码错误

1 个答案: