UnicodeWarning:Tkinter中的特殊字符

时间:2011-11-07 12:27:43

标签: python encoding tkinter character

我在Tkinter(Python 2.7)中编写了一个程序,这是挪威语中的一个scrabblehelper,它包含一些特殊字符(æøå),这意味着我的wordlist(ordliste)包含带有特殊字符的单词。

当我运行我的函数finnord(c *)时,它返回'cd'。我正在使用entry.get()来获取我的功能。

我的问题在于entry.get()的编码。我有本地编码UTF-8,但是当我在输入框中编写任何特殊字符并将它们与我的wordliste匹配时,我得到UniCodeError

这是我的输出。

Warning (from warnings module):
  File "C:\pythonprog\scrabble\feud.py", line 46
if s not in liste and s in ordliste:
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode -    
interpreting them as being unequal

当我在我的shell中写道时:

> ordinn.get()
u'k\xf8**e'
> ordinn.get().encode('utf-8')
'k\xc3\xb8**e'
> print ordinn.get()
kø**e
> print ordinn.get().encode('utf-8')
kø**e

任何人都知道为什么我无法将ordinn.get()(条目)与我的wordlist相匹配?

1 个答案:

答案 0 :(得分:6)

我可以通过这种方式重现错误:

% python
Python 2.7.2+ (default, Oct  4 2011, 20:03:08) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 'k\xf8**e' in [u'k\xf8**e']
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False

因此,s可能是str objectlisteordliste包含unicode,或者(如eryksun在评论中指出的那样)反之亦然。解决方案是解码str object s(最有可能使用utf-8编解码器)以使它们unicode

如果没有帮助,请打印并发布

的输出
print(repr(s))
print(repr(liste))
print(repr(ordliste))

我相信通过将所有字符串转换为unicode可以避免此问题。

  1. ordliste生成norsk.txt时,请使用 codecs.open('norsk.txt','r','utf-8')

    encoding = sys.stdin.encoding
    with codecs.open('norsk.txt','r','utf-8') as fil:
        ordliste = [line.rstrip(u'\n') for line in fil]
    
  2. 尽快将所有用户输入转换为unicode:

    def get_unicode(widget):
        streng = widget.get()
        try:
            streng = streng.decode('utf-8')
        except UnicodeEncodeError:
            pass
        return streng
    

  3. 所以也许试试这个:

    import Tkinter as tk
    import tkMessageBox
    import codecs
    import itertools
    import sys
    
    alfabetet = (u"abcdefghijklmnopqrstuvwxyz"
                 u"\N{LATIN SMALL LETTER AE}"
                 u"\N{LATIN SMALL LETTER O WITH STROKE}"
                 u"\N{LATIN SMALL LETTER A WITH RING ABOVE}")
    
    encoding = sys.stdin.encoding
    with codecs.open('norsk.txt','r',encoding) as fil:
        ordliste = set(line.rstrip(u'\n') for line in fil)
    
    def get_unicode(widget):
        streng = widget.get()
        if isinstance(streng,str):
            streng = streng.decode('latin-1')
        return streng
    
    def siord():
        alfa=lagtabell()
        try:
            streng = get_unicode(ordinn)
            ordene=finnord(streng,alfa)
            if len(ordene) == 0:
                # There are no words that match
                tkMessageBox.showinfo('Dessverre..','Det er ingen ord som passer...')
            else:
                # Done: The words that fit the pattern
                tkMessageBox.showinfo('Ferdig',
                    'Ordene som passer er:\n'+ordene.encode('utf-8'))
        except Exception as err:
            # There has been a mistake .. Check your word
            print(repr(err))
            tkMessageBox.showerror('ERROR','Det har skjedd en feil.. Sjekk ordet ditt.')
    
    def finnord(streng,alfa): 
        liste = set()
        for substitution in itertools.permutations(alfa,streng.count(u'*')):
            s = streng
            for ch in substitution:
                s = s.replace(u'*',ch,1)
            if s in ordliste:
                liste.add(s)
        liste = [streng]+list(liste)
        return u','.join(liste)+u'.'
    
    def lagtabell():
        tinbox = get_unicode(bokstinn)
        if not tinbox.isalpha():
            alfa = alfabetet
        else:
            alfa = tinbox.lower()
        return alfa
    
    root = tk.Tk()
    root.title('FeudHjelper av Martin Skow Røed')
    root.geometry('400x250+450+200')
    # root.iconbitmap('data/ikon.ico')
    
    skrift1 = tk.Label(root,
                    text = '''\
    Velkommen til FeudHjelper. Skriv inn de bokstavene du har, og erstatt ukjente med *.
    F. eks: sl**ge
    Det er kun lov til å bruke tre stjerner, altså tre ukjente bokstaver.''',
                    font = ('Verdana',8), wraplength=350)
    skrift1.pack(pady = 5)
    
    ordinn = tk.StringVar(None)
    tekstboks = tk.Entry(root, textvariable = ordinn)
    tekstboks.pack(pady = 5)
    
    # What letters do you have? Eg "ahneki". Leave blank here if you want all the words.
    skrift2 = tk.Label(root, text = '''Hvilke bokstaver har du? F. eks "ahneki". La det være blankt her hvis du vil ha alle ordene.''',
                    font = ('Verdana',8), wraplength=350)
    skrift2.pack(pady = 10)
    
    bokstinn = tk.StringVar(None)
    tekstboks2 = tk.Entry(root, textvariable = bokstinn)
    tekstboks2.pack()
    
    knapp = tk.Button(text = 'Finn ord!', command = siord)
    knapp.pack(pady = 10)
    root.mainloop()