如何处理utf-8编码错误?

时间:2019-04-12 16:55:34

标签: python encoding error-handling

我有一个程序,该程序读取以utf-8编码保存的扩展名为.txt的文件,并将两个标签之间的文本全部替换为空白(从文本中删除超文本信息)。 从Mac运行该程序时,该程序运行正常,但从Windows系统运行时,出现错误消息。错误消息如下:

Traceback (most recent call last):
  File "C:\Users\hadleyj\Desktop\QuantQual_program.py", line 1124, in 
<module>
menu()
  File "C:\Users\hadleyj\Desktop\QuantQual_program.py", line 1092, in 
menu
    separate_paratext()
  File "C:\Users\hadleyj\Desktop\QuantQual_program.py", line 609, in 
separate_paratext
    menu()
  File "C:\Users\hadleyj\Desktop\QuantQual_program.py", line 1094, in 
menu
    remove_paratext()
  File "C:\Users\hadleyj\Desktop\QuantQual_program.py", line 636, in remove_paratext
text=t.read()
  File "C:\Users\hadleyj\AppData\Local\Programs\Python\Python37- 
   32\lib\encodings\cp1252.py", line 23, in decode
       return codecs.charmap_decode(input,self.errors,decoding_table)[0]
    UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in 
position 53377: character maps to <undefined>

读取文件时,我尝试添加'encoding = utf-8',但错误仍然存​​在。代码如下。

u=input('If you want to return to options, type "1": ')

    while True:
        if u == '1':
            menu()
            break
        else:
            break

    print('\n\nREMINDER: TO RUN THIS PROGRAM YOUR PARATEXTUAL INFORMATION MUST BE TAGGED USING THE FOLLOWING TAG FORMAT: [@PARAST@] AND [@PARAPFN@]')

    while True:
        try:
            file_to_open =Path(input("\nYOU SELECTED OPTION 3: REMOVE PARATEXT. Please, insert your file path: "))

            with open(file_to_open) as t:
                text=t.read()
                break
        except FileNotFoundError:
            print("\nFile not found. Better try again")
        except IsADirectoryError:
            print("\nIncorrect Directory path.Try again")


    pat=re.compile(r'(\[@PARAST@\]).+?(\[@PARAFN@\])', flags=re.DOTALL)


    s = re.sub(pat, '', text)

    user=input('\n\n1. Create a folder for the file \n\n2.Select a directory for your files \n\n3. Go to menu \n\n.Selection: ')


    if user == '1':
        folder_path=Path(input('\n\nEnter your folder path: '))

        file_name=input('\n\nName your file. Extension not needed: ')
        fil_name=file_name+'.txt'
        try:
            os.makedirs(folder_path)
        except FileExistsError:
            print("This folder already exists. Try another name.")
        file=os.path.join(folder_path,fil_name)
        with open(file, 'w') as f:
            f.write(s)
        print('\n\nText named', fil_name, 'written to a file. Check folder 
named',folder_path, 'in your directory')
    elif user == '2':
        folder_path=Path(input('\n\nEnter your chosen directory: '))
        file_name=input('\n\nName your file. Extension not needed: ')
        f_name=file_name+'.txt'
        file_path=os.path.join(folder_path, f_name)
        with open(file_path, 'w') as f:
            f.write(s)
        print('\n\nText named', f_name, 'written to a file. Check folder: ', folder_path)
    else:
        print('Ok')

如何避免这种编码错误?我的代码有问题吗?谁能帮我吗?

0 个答案:

没有答案