保存文件时python获取unicode编码错误

时间:2019-09-10 10:33:15

标签: python unicode beautifulsoup

我正在尝试从网页中获取文字,因此产生了“回溯”(最近一次通话最近):   文件“ C:\ Users \ username \ Desktop \ Python \ parsing.py”,第21行,在     textFile.write(str(结果)) UnicodeEncodeError:'cp949'编解码器无法在位置37971上编码字符'\ xa9':非法的多字节序列'

我已经搜索并尝试过 textFile.write(str(results).decode('utf-8')) 并且不会造成属性错误。

import requests
import os
from bs4 import BeautifulSoup

outputFolderName = "output"

currentPath = os.path.dirname(os.path.realpath(__file__))
outputDir = currentPath + "/" +outputFolderName

r = requests.get('https://yahoo.com/')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.findAll(text=True)

try :
    os.mkdir(outputDir)
    print("output directory generated")
except :
    print("using existing directory")

textFile = open(outputDir + '/output.txt', 'w')
textFile.write(str(results))
textFile.close()

有什么方法可以转换str(results)的编解码器并正确保存吗?

python版本是3.7.3

1 个答案:

答案 0 :(得分:0)

请像本例一样指定编码

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import os
from bs4 import BeautifulSoup

outputFolderName = "output"

currentPath = os.path.dirname(os.path.realpath(__file__))
outputDir = currentPath + "/" +outputFolderName

r = requests.get('https://yahoo.com')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.findAll(text=True)

try :
    os.mkdir(outputDir)
    print("output directory generated")
except :
    print("using existing directory")

textFile = open(outputDir + '/output.txt', mode='w', encoding='utf8')
textFile.write(str(results))
textFile.close()