这是我的代码:
import tabula
import csv
import pandas as pd
import numpy as np
from pandas import ExcelWriter
from tkinter import *
from tkinter import filedialog
from tkinter.filedialog import askopenfilename
from tkinter.messagebox import showerror
import smtplib
from datetime import datetime
import time
root = Tk()
root.fileName = filedialog.askopenfilename(filetypes=(("PDF Files","*.pdf"),("All files","*.*")))
tabula.convert_into(root.fileName, "_ExportedPDF.csv", output_format="csv", pages="all")
path = root.fileName
with open(path, 'r', encoding='utf-8', errors='ignore') as infile, open(path + 'final.csv', 'w') as outfile:
inputs = csv.reader(infile)
output = csv.writer(outfile)
for index, row in enumerate(inputs):
# Create file with no header
if index == 0:
continue
output.writerow(row)
df = pd.read_csv('_ExportedPDF.csv')
# print (df)
root.destroy()
所以,在我将整个转换添加到utf-8子句之前,我得到了这个错误,说它无法解码。所以我添加了utf-8转换,现在我收到一个错误:
Traceback (most recent call last):
File "C:/Users/BregmanM/Documents/Python/PDF to Excel/RiskSummary/risk-summary.py", line 29, in <module>
output.writerow(row)
File "C:\Users\BregmanM\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0512' in position 1: character maps to <undefined>
是否有一个简单的方法可以使一切变成utf-8?我输出的CSV是完全完美的,所以我很困惑为什么我不能在数据帧中读取它。