我正在尝试从网页中获取文字,因此产生了“回溯”(最近一次通话最近): 文件“ C:\ Users \ username \ Desktop \ Python \ parsing.py”,第21行,在 textFile.write(str(结果)) UnicodeEncodeError:'cp949'编解码器无法在位置37971上编码字符'\ xa9':非法的多字节序列'
我已经搜索并尝试过 textFile.write(str(results).decode('utf-8')) 并且不会造成属性错误。
import requests
import os
from bs4 import BeautifulSoup
outputFolderName = "output"
currentPath = os.path.dirname(os.path.realpath(__file__))
outputDir = currentPath + "/" +outputFolderName
r = requests.get('https://yahoo.com/')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.findAll(text=True)
try :
os.mkdir(outputDir)
print("output directory generated")
except :
print("using existing directory")
textFile = open(outputDir + '/output.txt', 'w')
textFile.write(str(results))
textFile.close()
有什么方法可以转换str(results)的编解码器并正确保存吗?
python版本是3.7.3
答案 0 :(得分:0)
请像本例一样指定编码
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import requests
import os
from bs4 import BeautifulSoup
outputFolderName = "output"
currentPath = os.path.dirname(os.path.realpath(__file__))
outputDir = currentPath + "/" +outputFolderName
r = requests.get('https://yahoo.com')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.findAll(text=True)
try :
os.mkdir(outputDir)
print("output directory generated")
except :
print("using existing directory")
textFile = open(outputDir + '/output.txt', mode='w', encoding='utf8')
textFile.write(str(results))
textFile.close()