Python 2.7中的字符串切片对于获取子字符串非常有用。这适用于ASCII字符,例如
>>> s = "Antonio"
>>> s[5:7]
'io'
但在有重音字符的情况下失败,例如
>>> s = "António"
>>> s[5:7]
'ni'
无论原始字符串中是否存在字符,获取正确子字符串的安全方法是什么?
更新我的配置信息如下:
Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec 5 2015, 12:54:16)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
由于
答案 0 :(得分:2)
在Python 2.7中,字符串和unicode字符串是不同的对象。要声明Unicode字符串文字,请在其前面添加u
:
Python 2.7.10 (default, Oct 23 2015, 19:19:21)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "António"
>>> len(s)
8
>>> s2 = u"António"
>>> len(s2)
7
>>> s[5:7]
'ni'
>>> s2[5:7]
u'io'
答案 1 :(得分:1)
我终于找到了问题的答案。我只需要读取这样的文本文件:
import codecs
with codecs.open(ficheiro, encoding='utf-8') as fin:
for line in fin:
... # then here line[5:7] will work correctly for "António" and "Antonio"
的Derek Dohler