这是我的代码:
def digraph(chars):
als = "шжяеёющчШЖЯЕЁЮЩЧ"
new = {'sh':als[0],'zh':als[1],'ja':als[2],'je':als[3],'jo':als[4],
'ju':als[5],'sx':als[6],'ch':als[7],'Sh':als[8],'Zh':als[9],
'Ja':als[10],'Je':als[11],'Jo':als[12],'Ju':als[13],'Sx':als[14],
'Ch':als[15],'SH':als[8],'ZH':als[9],'JA':als[10],'JE':als[11],
'JO':als[12],'JU':als[13],'SX':als[14],'CH':als[15]}
try:
return new[chars]
except:
return "[Error]"
def trans_cyr(inp):
cyrillic = "абцдэфгхийклмнопрстувъыьзАБЦДЭФГХИЙКЛМНОПРСТУВЫЗ "
latin = "abcdefghijklmnoprstuv$y'zABCDEFGHIJKLMNOPRSTUVYZ "
digs = ['sh','zh','ja','je','jo','ju','sx','ch','Sh','Zh',
'Ja','Je','Jo','Ju','Sx','Ch','SH','ZH','JA','JE','JO','JU','SX',
'CH']
prevc = ""
for e, char in enumerate(inp):
if(prevc != ""):
comb = prevc + char
newdig = digraph(comb)
if(comb in digs):
print(newdig, end="")
prevc = ""
else:
pos = latin.index(char)
posp = latin.index(inp[e - 1])
if(inp[e-1] in "szjcSZJC"):
print(cyrillic[posp] + cyrillic[pos], end="")
prevc = ""
else:
prevc=""
continue
elif(char not in "szjcSZJC"):
try:
pos = latin.index(char)
print(cyrillic[pos], end="")
except:
print(char, end="")
else:
prevc = char
while True:
cyrinp = input("\n> ")
trans_cyr(cyrinp)
代码应该将拉丁字母音译为Cyrillic,首先从输入中获取每个字符(如果它不是'szjc'或它们的大写等价物),使用index()函数获取它的位置然后获得与拉丁语相同位置的西里尔语等价物。然而,西里尔字母有Я,Е,Ё,Ю,Ж,Ш,Щ,Ч等字母,它们是有向图(ya,ye,yo,yu,zh,sh,shch(sx),ch),因此不能仅由一个字符音译。因此,我所做的是检查当前字母是否等于'szjcSZJC'中的任何一个,如果是,那么我不打印它,而是如果下一个字符与prevc结合在数组'digs中,则将其命名为prevc ”。一切都很完美,如果我输入'jojajo'它将打印“ёяё”就像它应该的那样,但是 - 如果有一个未完成的有向图(c没有h,s没有h,x,z没有h,j没有a,e ,你和o)然后下一个有向图不会被音译。示例:sjo:如果我输入sjo,我的预期输出将是сё,但我得到的是сйо。有什么方法可以解决这个问题吗?
修改
我写了这段代码:
while True:
cyrillic = "абцдэфгхийклмнопрстувъыьзАБЦДЭФГХИЙКЛМНОПРСТУВЫЗ "
latin = "abcdefghijklmnoprstuv$y'zABCDEFGHIJKLMNOPRSTUVYZ "
als = "шжяеёющчШЖЯЕЁЮЩЧ"
new = {'sh':als[0],'zh':als[1],'ja':als[2],'je':als[3],'jo':als[4],
'ju':als[5],'sx':als[6],'ch':als[7],'Sh':als[8],'Zh':als[9],
'Ja':als[10],'Je':als[11],'Jo':als[12],'Ju':als[13],'Sx':als[14],
'Ch':als[15],'SH':als[8],'ZH':als[9],'JA':als[10],'JE':als[11],
'JO':als[12],'JU':als[13],'SX':als[14],'CH':als[15]}
inp = input("\n> ") + " "
digraph = ""
prevc = ""
for e, char in enumerate(inp):
part_j = "jJ"
part_v = "aeouAEOU"
part_z = "zZ"
part_h = "hH"
part_s = "sS"
part_x = "hxHX"
part_c = "cC"
if((char in part_j and inp[e+1] in part_v) or (char in part_z and inp[e+1] in part_h) or (char in part_s and inp[e+1] in part_x) or (char in part_c and inp[e+1] in part_h)):
digraph = "yes"
else:
digraph = "no"
if((char in part_v and inp[e-1] in part_j) or (char in part_h and inp[e-1] in part_z) or (char in part_x and inp[e-1] in part_s) or (char in part_h and inp[e-1] in part_c)):
comb = inp[e-1] + char
dig = new[comb]
print(dig, end="")
elif(digraph == "yes"):
prevc = char
else:
try:
print(cyrillic[latin.index(char)],end="")
except:
print(char, end="")
似乎与我选择的答案具有相同的逻辑,并且有效:)
答案 0 :(得分:0)
为什么不在szjcSZJC之后看一下这个角色,看看它是否属于构成你的有向图的任何东西(在这种情况下是o)。如果它没有,则正常音译并继续下一个字母,其中它识别出有关图形' jo'
答案 1 :(得分:0)
这是一个使用与您的方法相同的代码逻辑的解决方案,但写得更清楚。
CYRILLIC = u"абцдэфгхийклмнопрстувъыьзАБЦДЭФГХИЙКЛМНОПРСТУВЫЗ "
LATIN = u"abcdefghijklmnoprstuv$y'zABCDEFGHIJKLMNOPRSTUVYZ "
DIGRAPHS = u"шжяеёющчШЖЯЕЁЮЩЧШЖЯЕЁЮЩЧ"
LATIN_DIGRAPHS = [u'sh', u'zh', u'ja', u'je', u'jo', u'ju', u'sx', u'ch',
u'Sh', u'Zh', u'Ja', u'Je', u'Jo', u'Ju', u'Sx', u'Ch',
u'SH', u'ZH', u'JA', u'JE', u'JO', u'JU', u'SX', u'CH']
MAPPING = dict(zip(list(LATIN) + LATIN_DIGRAPHS, CYRILLIC + DIGRAPHS))
DIGRAPH_FIRST_LETTER = u'szjcSZJC'
def latin_to_cyrillic(word):
translation = []
possible_digraph = False
for letter in word:
if possible_digraph:
combination = previous_letter + letter
if combination in LATIN_DIGRAPHS:
translation.append(MAPPING[combination])
possible_digraph = False
else:
translation.append(MAPPING[previous_letter])
if letter in DIGRAPH_FIRST_LETTER:
previous_letter = letter
else:
translation.append(letter)
possible_digraph = False
else:
if letter in DIGRAPH_FIRST_LETTER:
possible_digraph = True
previous_letter = letter
else:
translation.append(MAPPING[letter])
if possible_digraph:
translation.append(MAPPING[previous_letter])
return ''.join(translation)
print latin_to_cyrillic('sjo')
print latin_to_cyrillic('jojajo')
逻辑如下。
您可以简单地使用将每个字母从一种语言映射到另一种语言的字典,而不是在拉丁字母表中找到字母索引并使用该索引作为西里尔字母。只需创建所有拉丁符号(单个和有向图)的列表,另一个用于西里尔文。您只需要确保两者的相应符号顺序相同。然后,dict(zip(alphabet1, alphabet2))
会为alphabet1
的每个字母创建一个映射到alphabet2
中相同索引的映射。