Question

虽然我已经适当地设置了表达式，但是拆分没有按预期工作。

c = re.compile(r'(?<=^\d\.\d{1,2})\s+');
for header in ['1.1 Introduction', '1.42 Appendix']:
    print re.split(c, header)

预期结果：

['1.1', 'Introduction']
['1.42',  'Appendix']

我得到以下stacktrace：

追踪（最近的呼叫最后）：
  文件“foo.py”，第1行，中   c = re.compile（r'（？＆lt; = ^ \ d。\ d {1,2}）\ s +'）;
  文件“C：\ Python27 \ lib \ re.py”，第190行，在编译中   return _compile（pattern，flags）
  文件“C：\ Python27 \ lib \ re.py”，第242行，在_compile中   提出错误，v＃无效表达
  sre_constants.error：look-behind需要固定宽度模式
  ＆LT;＆LT;＆LT;流程结束了。（退出代码1）

Answer 1

python中的Lookbehinds不能是可变宽度，所以你的lookbehind无效。

您可以使用捕获组作为解决方法：

c = re.compile(r'(^\d\.\d{1,2})\s+');
for header in ['1.1 Introduction', '1.42 Appendix']:
    print re.split(c, header)[1:] # Remove the first element because it's empty

输出：

['1.1', 'Introduction']
['1.42', 'Appendix']

Answer 2

正则表达式中的错误位于{1,2}部分，因为Lookbehinds需要是固定宽度的，因此不允许使用量词。

尝试使用此website来测试您的正则表达式，然后再将其放入代码中。

但是在您的情况下，您根本不需要使用正则表达式：

试试这个：

for header in ['1.1 Introduction', '1.42 Appendix']:
    print header.split(' ')

结果：

['1.1', 'Introduction']
['1.42', 'Appendix']

希望这会有所帮助。

Answer 3

我的解决方案可能看起来很蹩脚。但你在点后只检查两位数。所以，你可以使用两个lookbehind。

c = re.compile(r'(?:(?<=^\d\.\d\d)|(?<=^\d\.\d))\s+');

Python正面看后面分裂变宽

3 个答案: