如何将CSV重新格式化为UTF-8

时间:2017-06-15 17:54:19

标签: python csv pandas encoding utf-8

这是我的代码:

import tabula
import csv
import pandas as pd
import numpy as np
from pandas import ExcelWriter
from tkinter import *
from tkinter import filedialog
from tkinter.filedialog import askopenfilename
from tkinter.messagebox import showerror
import smtplib
from datetime import datetime
import time

root = Tk()
root.fileName = filedialog.askopenfilename(filetypes=(("PDF Files","*.pdf"),("All files","*.*")))

tabula.convert_into(root.fileName, "_ExportedPDF.csv", output_format="csv", pages="all")

path = root.fileName

with open(path, 'r', encoding='utf-8', errors='ignore') as infile, open(path + 'final.csv', 'w') as outfile:
     inputs = csv.reader(infile)
     output = csv.writer(outfile)

     for index, row in enumerate(inputs):
         # Create file with no header
         if index == 0:
             continue
         output.writerow(row)

df = pd.read_csv('_ExportedPDF.csv')
# print (df)
root.destroy()

所以,在我将整个转换添加到utf-8子句之前,我得到了这个错误,说它无法解码。所以我添加了utf-8转换,现在我收到一个错误:

Traceback (most recent call last):
  File "C:/Users/BregmanM/Documents/Python/PDF to Excel/RiskSummary/risk-summary.py", line 29, in <module>
    output.writerow(row)
  File "C:\Users\BregmanM\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0512' in position 1: character maps to <undefined>

是否有一个简单的方法可以使一切变成utf-8?我输出的CSV是完全完美的,所以我很困惑为什么我不能在数据帧中读取它。

0 个答案:

没有答案