Python正则表达式由数字自行分割

时间:2017-04-18 18:32:00

标签: python regex

我想将这个文本拆分成一行中的数字。

1
root -0.307087 17.6356 -28.2214 2.36076 1.44212 -4.54601
lowerback 15.4094 -0.182495 1.65268
upperback 1.54579 0.0318172 -0.110122
thorax -6.9977 -0.0335751 -1.06068
lowerneck -3.24163 -0.676991 -1.34632
upperneck -9.28199 -0.818331 1.08102
head -2.3551 -0.388697 0.578143
rclavicle 1.74931e-014 -4.77083e-015
rhumerus -42.2757 19.3184 -90.6312
rradius 79.2191
rwrist 2.46902
rhand -35.8906 32.487
rfingers 7.12502
rthumb -9.00425 2.69918
lclavicle 1.74931e-014 -4.77083e-015
lhumerus -46.581 -10.5126 91.072
lradius 108.082
lwrist 30.7395
lhand -39.5085 13.512
lfingers 7.12502
lthumb -12.4939 43.1185
rfemur 4.30283 -1.72433 25.7796
rtibia 82.7602
rfoot 27.83 -8.73877
rtoes 20.2614
lfemur -27.49 -2.09007 -20.1015
ltibia 38.398
lfoot -7.19848 -5.78026
ltoes 5.97973
2
root -0.303728 17.5624 -27.7253 2.02549 1.77071 -4.33872
lowerback 16.0608 -0.380636 1.35189
upperback 1.68665 -0.267024 -0.0539964
thorax -7.21419 -0.169571 -0.765959
lowerneck -2.88855 -0.493739 -1.55908
upperneck -9.88628 -0.567977 1.15901
head -2.623 -0.258251 0.642519
rclavicle -7.65321e-015 -2.38542e-015
rhumerus -42.619 18.2084 -90.2387
rradius 76.8375
rwrist 5.33346
rhand -37.643 32.4997
rfingers 7.12502
rthumb -10.695 2.7919
lclavicle -7.65321e-015 -2.38542e-015
lhumerus -43.8177 -11.0502 91.3641
lradius 108.431
lwrist 30.2025
lhand -38.9758 12.3082
lfingers 7.12502
lthumb -11.9803 41.9454
rfemur 1.76685 -3.0026 24.5235
rtibia 87.0878
rfoot 27.0955 -9.32294
rtoes 22.2194
lfemur -26.5572 -2.78834 -20.4876
ltibia 40.7855
lfoot -10.1476 -3.85256
ltoes 0.48001
3
root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517
lowerback 16.9292 -0.51999 1.14183
upperback 1.81465 -0.483798 -0.143209
thorax -7.55951 -0.270454 -0.690263
lowerneck -2.59928 -0.313935 -1.56078
upperneck -10.5834 -0.320817 1.24057
head -2.91503 -0.136576 0.671345
rclavicle -1.54058e-014 -3.97569e-015
rhumerus -42.9367 16.607 -89.7942
rradius 74.9122
rwrist 7.29535
rhand -38.4744 33.0964
rfingers 7.12502
rthumb -11.4968 3.43167
lclavicle -1.54058e-014 -3.97569e-015
lhumerus -40.8446 -11.9999 91.445
lradius 108.671
lwrist 29.7854
lhand -38.5919 11.658
lfingers 7.12502
lthumb -11.6101 41.3163
rfemur -0.94671 -4.033 23.2605
rtibia 91.2781
rfoot 26.5333 -9.15277
rtoes 23.1538
lfemur -25.0499 -3.27418 -20.9658
ltibia 42.1017
lfoot -12.067 -2.99804
ltoes -2.17676

理想情况下,我想在不包括数字的独立数字之间获取内容。 我试过这个规则:

r"[0-9]+(?<=)[\r\n]"

我希望找到之前没有任何内容的数字,然后是新行。

这样做的正确规则是什么?

1 个答案:

答案 0 :(得分:2)

您的正则表达式尝试无法工作,原因有多种,例如,它会消耗十进制数字的数字,因为它不是由换行符开始的。前瞻也没有意义(似乎是空的),你不需要它。

我会拆分“数字”正则表达式,包含在2个换行符之间(在换行符之前可选择回车符号以防万一)

试验:

import re

text = """rfoot 27.0955 -9.32294
lfoot -10.1476 -3.85256
ltoes 0.48001
3
root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517
rwrist 7.29535
5
rhand -38.4744 33.0964
lradius 108.671
lwrist 29.7854"""


print(re.split(r"\r?\n\d+\r?\n",text))

result: ['rfoot 27.0955 -9.32294\nlfoot -10.1476 -3.85256\nltoes 0.48001', 'root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517\nrwrist 7.29535', 'rhand -38.4744 33.0964\nlradius 108.671\nlwrist 29.7854']

请注意,这种简单的方法不能处理文本在一行上单独开始或以数字结尾的情况。我们必须通过添加^||$个案来对其进行复杂化处理,但在这种情况下,我们会遗漏单个换行符并且还会显示空字段。因此,我们可以应用校正列表理解来过滤掉“空白”字段(可能可以使用纯正则表达式完成):

text = """1
rfoot 27.0955 -9.32294
lfoot -10.1476 -3.85256
ltoes 0.48001
3
root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517
rwrist 7.29535
5
rhand -38.4744 33.0964
lradius 108.671
lwrist 29.7854
4"""


print([x for x in re.split(r"(^|\r?\n)\d+(\r?\n|$)",text) if x.strip()])

结果:

['rfoot 27.0955 -9.32294\nlfoot -10.1476 -3.85256\nltoes 0.48001', 'root -0.294208 17.4728 -27.2384 1.62853 1.94279 -4.06517\nrwrist 7.29535', 'rhand -38.4744 33.0964\nlradius 108.671\nlwrist 29.7854']