Question

我在windows-1256中有一个阿拉伯字符串，我需要将其转换为ascii，以便可以将其发送到html2text。但是在执行时会返回错误，说明str对象没有属性'decode'

filename=codecs.open(keyworddir + "\\" + item, "r", encoding = "windows-1256")
outputfile=filename.readlines()
file=open(keyworddir + "\\" + item, "w")
for line in outputfile:
    line=line.decode(encoding='windows-1256')
    line=line.encode('UTF-8')
    file.write(line)
file.close()

Answer 1

在Python 3中，str已经是解码的Unicode字符串，因此您无法再次解码line。

您错过了，在阅读文件时隐式解码。具有codecs.open模式的"r"允许将文件作为具有给定编码的文本文件读取，并自动解码所有文本。

因此。你可以：

以二进制模式打开文件：filename=open(keyworddir + "\\" + item, "rb");这些行现在是bytes，它们将是可解码的
或者，更好的是，只需删除多余的解码： ~~line=line.decode(encoding='windows-1256')~~

注意：
您应该考虑使用codecs.open(keyworddir + "\\" + item, "w", encoding = "utf-8")打开输出文件，因此无需手动编码line

Answer 2

我遇到了类似的问题，我花了5天时间尝试解决这个问题，最后我使用了以下解决方案。

在打开文件之前将此命令运行到命令行（当然是在linux命令行中）

iconv -f 'windows-1256' -t 'uft-8' '[your file name]' -o '[output file name]'

所以你可以使用那个python函数在python代码中自动运行命令行命令

import subprocess
def run_cmd(cmd):
    process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
    process.wait()

Python3 String没有解码windows-1256

2 个答案: