这是我的代码:
# -*- coding: utf-8-*-
array=["à","á","â","ã","ä","å","æ","ç","è","é","ê","ë","ì","í","î","ï","ð","ñ","ó","ô","õ","ö","ø","ù","ú","û","ü","ý","þ","ÿ"]
array1=["א","ב","ג","ד","ה","ו","ז","ח","ט","י","ך","כ","ל","ם","מ","ן","נ","ס","ע","ף","פ","ץ","צ","ק","ר","ש","ת"]
str="áï éäåãä"
message=""
for i in range(0,len(str)):
s=str[i]
index=-1
for j in range(0,len(array)):
if(array[j]==s):
index=j
break
if(index!=-1):
message+=array1[index]
print array1[index]
print message
错误是:
SyntaxError: EOL while scanning string literal
第2行
我有希伯来语的文本文件,但无论编码是什么,它总是以gibbrish显示。这是一个将其转换为希伯来语的python程序。原始文件在IS0-8859-1
答案 0 :(得分:4)
您使用了'
,您应该使用"
:
'ÿ"
表示最后一项:
array=["à","á","â","ã","ä","å","æ","ç","è","é","ê","ë","ì","í","î","ï","ð","ñ","ó","ô","õ","ö","ø","ù","ú","û","ü","ý","þ",'ÿ"]
将单引号设为双倍。
至于你的翻译课程;听起来好像你的文件编码不正确,或者解码不正确。也许您应该找出正确的编码,而不是盲目用希伯来语代码点的UTF-8序列替换Latin-1字节?
如果您使用codec
模块使用正确的编解码器打开文件并解码为Unicode,那么您很可能会发现数据已正确编码。
我强烈敦促您在继续之前研究Unicode,编解码器和Python:
答案 1 :(得分:3)
正如@Martijn建议的那样,正确解码原始文件将是一个更好的解决方案。如果您的文件是希伯来语但显示array
个字符,则可能会显示为latin1
或cp1252
编码。 cp1255
看起来很贴心。也许你的array1
不太对劲。另请注意,字符串是可迭代的,因此您可以简化数组:
# coding: utf8
array = u'àáâãäåæçèéêëìíîïðñóôõöøùúûüýþÿ'
array1 = u'אבגדהוזחטיךכלםמןנסעףפץצקרשת'
print(array)
print(array1)
print(array.encode('cp1252').decode('cp1255',errors='replace'))
上面的最后一行反转了“错误”编码,并用cp1255
(希伯来语编码)对其进行解码。输出:
àáâãäåæçèéêëìíîïðñóôõöøùúûüýþÿ
אבגדהוזחטיךכלםמןנסעףפץצקרשת
אבגדהוזחטיךכלםמןנסףפץצרשת���
这不是一个完美的匹配,但足够接近,我认为你的原始文件是用cp1255
编码的。