字符串切片不适用于重音字符

时间:2016-06-15 23:51:35

标签: python string

Python 2.7中的字符串切片对于获取子字符串非常有用。这适用于ASCII字符,例如

>>> s = "Antonio"
>>> s[5:7]
'io'

但在有重音字符的情况下失败,例如

>>> s = "António"
>>> s[5:7]
'ni'

无论原始字符串中是否存在字符,获取正确子字符串的安全方法是什么?

更新我的配置信息如下:

Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec  5 2015, 12:54:16) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

由于

2 个答案:

答案 0 :(得分:2)

在Python 2.7中,字符串和unicode字符串是不同的对象。要声明Unicode字符串文字,请在其前面添加u

Python 2.7.10 (default, Oct 23 2015, 19:19:21)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> s = "António"
>>> len(s)
8
>>> s2 = u"António"
>>> len(s2)
7
>>> s[5:7]
'ni'
>>> s2[5:7]
u'io'

答案 1 :(得分:1)

我终于找到了问题的答案。我只需要读取这样的文本文件:

import codecs
with codecs.open(ficheiro, encoding='utf-8') as fin:
    for line in fin:
       ...  # then here line[5:7] will work correctly for "António" and "Antonio"

感谢编写Solving Unicode Problems in Python 2.7

的Derek Dohler