Question

以下代码;

phonenumbers = ['(209) 525-2987', '509-477-4598', None, '229-259–1234']
phoneCheck = re.compile('^[1-9]\d{2}-\d{3}-\d{4}$')

for pn in phonenumbers:
    print pn
    if phoneCheck.match(str(pn)):
        print 'Matched!'
    else:
        print 'Not Matched!'

我在结果中收到此错误，并且我认为它与电话号码中使用的错误类型的短划线有关，我如何更正这个以便标记为不匹配？

(209) 576-6546
Not Matched!
509-477-6726
Not Matched!
None
Not Matched!
229-259–9756
Runtime error 
Traceback (most recent call last):
  File "<string>", line 6, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 7: ordinal not in range(128)

Answer 1

您的诊断是正确的。（最后一个电话号码中的第二个破折号是某种奇特的破折号，我打赌你从文字处理器或电子表格中复制并粘贴了电话号码。无论如何......）

以下是快速简便的方法：安装unidecode包，然后：

import re
import warnings

import unidecode

dash = u'\u2013'
phonenumbers = ['(209) 525-2987', '509-477-4598', None, '229-259' + dash + '1234']
phoneCheck = re.compile('^[1-9]\d{2}-\d{3}-\d{4}$')

# if you pass an ascii string into unidecode, it will complain, but still work.
# Just catch the warnings.
with warnings.catch_warnings():
    warnings.simplefilter("ignore")

    for pn in phonenumbers:
        print pn

        # if pn is None, it's not a phone number (and None will cause unidecode
        # to throw an error)
        if pn and phoneCheck.match(unidecode.unidecode(pn)):
            print 'Matched!'
        else:
            print 'Not Matched!'

UnicodeEncodeError：＆＃39; ascii＆＃39;编解码器无法对字符u＆＃39; \ u2013＆＃39;进行编码。在第7位

1 个答案: