Python.- fuzzy.DMetaphone' ascii'错误

时间:2017-10-24 21:48:29

标签: python fuzzy fuzzywuzzy

如何可能,使用相同的输入我有时会出现ascii codec错误,有时候它可以正常工作?代码会清除名称并构建其SoundexDMetaphone值。它可以在5次运行中运行,有时更常见:)

UPD:看起来这是fuzzy.DMetaphone的问题,至少在使用Unicode的Python2.7上是这样。现在计划集成Metaphonefuzzy.DMetaphone问题的所有解决方案都非常受欢迎:)

UPD 2: fuzzy更新到1.2.2后问题消失了。相同的代码工作正常。

import re
import fuzzy
import sys

def make_namecard(full_name):
    soundex = fuzzy.Soundex(4)
    dmeta = fuzzy.DMetaphone(4)
    names = process_name(full_name)
    print names
    soundexes = map(soundex, names)
    dmetas = []
    for name in names:
        print name
        dmetas.extend(list(dmeta(name)))
    dmetas = filter(bool, dmetas)

    return {
        "full_name": full_name,
        "soundex": soundexes,
        "dmeta": dmetas,
        "names": names,
    }

def process_name(full_name):
    full_name = re.sub("[_-]", " ", full_name)
    full_name = re.sub(r'[^A-Za-z0-9 ]', "", full_name)
    names = full_name.split()
    names = filter(valid_name, names)
    return names

def valid_name(name):
    COMMON_WORDS = ["the", "of"]
    return len(name) >= 2 and name.lower() not in COMMON_WORDS

print make_namecard('Jerusalem Warriors') 

输出:

➜ python2.7 make_namecard.py
['Jerusalem', 'Warriors']
Jerusalem
Warriors
{'soundex': [u'J624', u'W624'], 'dmeta': [u'\x00\x00\x00\x00', u'ARSL', u'ARRS', u'FRRS'], 'full_name': 'Jerusalem Warriors', 'names': ['Jerusalem', 'Warriors']}

➜ python2.7 make_namecard.py
['Jerusalem', 'Warriors']
Jerusalem
Traceback (most recent call last):
  File "make_namecard.py", line 38, in <module>
    print make_namecard('Jerusalem Warriors') 
  File "make_namecard.py", line 16, in make_namecard
    dmetas.extend(list(dmeta(name)))
  File "src/fuzzy.pyx", line 258, in fuzzy.DMetaphone.__call__
UnicodeDecodeError: 'ascii' codec can't decode byte 0xab in position 0: ordinal not in range(128)

0 个答案:

没有答案