以下代码;
phonenumbers = ['(209) 525-2987', '509-477-4598', None, '229-259–1234']
phoneCheck = re.compile('^[1-9]\d{2}-\d{3}-\d{4}$')
for pn in phonenumbers:
print pn
if phoneCheck.match(str(pn)):
print 'Matched!'
else:
print 'Not Matched!'
我在结果中收到此错误,并且我认为它与电话号码中使用的错误类型的短划线有关,我如何更正这个以便标记为不匹配?
(209) 576-6546
Not Matched!
509-477-6726
Not Matched!
None
Not Matched!
229-259–9756
Runtime error
Traceback (most recent call last):
File "<string>", line 6, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 7: ordinal not in range(128)
答案 0 :(得分:1)
您的诊断是正确的。 (最后一个电话号码中的第二个破折号是某种奇特的破折号,我打赌你从文字处理器或电子表格中复制并粘贴了电话号码。无论如何......)
以下是快速简便的方法:安装unidecode包,然后:
import re
import warnings
import unidecode
dash = u'\u2013'
phonenumbers = ['(209) 525-2987', '509-477-4598', None, '229-259' + dash + '1234']
phoneCheck = re.compile('^[1-9]\d{2}-\d{3}-\d{4}$')
# if you pass an ascii string into unidecode, it will complain, but still work.
# Just catch the warnings.
with warnings.catch_warnings():
warnings.simplefilter("ignore")
for pn in phonenumbers:
print pn
# if pn is None, it's not a phone number (and None will cause unidecode
# to throw an error)
if pn and phoneCheck.match(unidecode.unidecode(pn)):
print 'Matched!'
else:
print 'Not Matched!'