Question

我已经使用POST请求在python中编写了一个脚本，以从网页中抓取json内容。运行脚本时，可以按预期在控制台中得到结果。但是，当我尝试在csv文件中写入相同内容时遇到问题。当我尝试像： with open ("outputContent.csv","w",newline="") as f:

我遇到以下错误：

Traceback (most recent call last):
  File "C:\Users\WCS\AppData\Local\Programs\Python\Python36-32\all_reviews_grabber.py", line 27, in <module>
    writer.writerow([nom,ville,region])
  File "C:\Users\WCS\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufb02' in position 16: character maps to <undefined>

当我尝试执行以下操作时，脚本确实会生成一个数据缠住的csv文件：

with open ("outputContent.csv","w",newline="",encoding="utf-8") as f:

但是，csv文件包含一些难以理解的内容，例如：

BeijingshÃ¬
XinjiangwÃ©iwÃºerzÃ¬zhÃ¬qu
ShÃ nghaishÃ¬
Qingpuqu
ShÃ nghaishÃ¬
XÃºhuÃ¬qu
PutuÃ³qu

到目前为止，这是我的脚本：

import csv
import requests
from bs4 import BeautifulSoup

baseUrl = "https://fr-vigneron.gilbertgaillard.com/importer"
postUrl = "https://fr-vigneron.gilbertgaillard.com/importer/ajax"

with requests.Session() as s:
    req = s.get(baseUrl)
    sauce = BeautifulSoup(req.text,"lxml")
    token = sauce.select_one("input[name='_token']")['value']

    payload = {
        'data': 'country=0&type=0&input_search=',
        '_token': token
        }

    res = s.post(postUrl,data=payload)
    with open ("outputContent.csv","w",newline="",encoding="utf-8") as f:
        writer = csv.writer(f)
        writer.writerow(['nom','ville','region'])
        for item in res.json():
            nom = item['prospect_nom']
            ville = item['prospect_ville']
            region = item['prospect_region']
            print(nom,ville,region)
            writer.writerow([nom,ville,region])

如何以正确的方式在csv文件中写入内容？

Answer 1

看看这个-http://www.pgbovine.net/unicode-python-errors.htm

在解释器中检查默认编码：

导入系统

sys.stdout.encoding
旧版本的Python也可能导致此错误。

Answer 2

会使用熊猫解析然后编写缓解问题吗？

std::make_index_sequence<Size>

Answer 3

只要删除print语句，该代码即可正常工作。^*。

您看到的损坏数据是因为您正在从cp1252而不是在查看时从UTF-8解码文件数据。

>>> s = 'Xinjiangwéiwúerzìzhìqu'
>>> encoded = s.encode('utf-8')
>>> encoded.decode('cp1252')
'XinjiangwÃ©iwÃºerzÃ¬zhÃ¬qu'

如果要通过在Python中打开csv文件来查看数据，请确保在打开数据时指定UTF-8编码：

open('outputContent.csv', 'r', encoding='utf-8'...

如果要使用Excel等应用程序打开文件，请确保在打开文件时指定编码为UTF-8。

如果您未指定编码，则将使用默认的cp1252编码来解码文件中的数据，并且您会看到垃圾数据。

^* print将自动使用默认编码，因此，如果它尝试对无法编码为cp1252的字符进行编码，则会出现异常。

写入CSV档案时无法清除难以辨认的内容

3 个答案: