Python:正则表达式和字符串长度(以字节为单位)

时间:2011-07-01 15:52:30

标签: python regex string-length

我正在用python编写一个程序并且有一些问题(我对python是100%新的):

import re

rawData = '7I+8I-7I-9I-8I-'

print len(rawData)

rawData = re.sub("[0-9]I\+","",rawData)
rawData = re.sub("[0-9]I\-","",rawData)

print rawData
  1. 如何使用|将2个正则表达式合并为一个?这意味着只使用一个正则表达式操作就可以摆脱9I-9I+
  2. len(rawData)是否返回rawData的长度为byte?
  3. 谢谢。

3 个答案:

答案 0 :(得分:5)

看到区别:

$ python3
Python 3.1.3 (r313:86834, May 20 2011, 06:10:42) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> len('día')   # Unicode text
3
>>> 

$ python
Python 2.7.1 (r271:86832, May 20 2011, 17:19:04) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> len('día')   # bytes
4
>>> len(u'día')  # Unicode text
3
>>>


Python 3.1.3 (r313:86834, May 20 2011, 06:10:42) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> len(b'día')
  File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.
>>> len(b'dia')
3
>>> 

答案 1 :(得分:0)

len是指应用于unicode字符串时的字符数(这是细微差别,其他答案会更多地刷新),编码字符串中的字节,列表中的项目(或集合或键入)字典)...

rawData = re.sub("[0-9]I(\+|-)","",rawData)

答案 2 :(得分:0)

你为什么不采取不同的方法。使用replace方法?