修改

Question

说我有一个字符串：

teststring =  "1.3 Hello how are you 1.4 I am fine, thanks 1.2 Hi There 1.5 Great!"

我想要：

testlist = ["1.3 Hello how are you", "1.4 I am fine, thanks 1.2 Hi There", "1.5 Great!"]

基本上，仅在增加的数字上进行分割，其中差值为.1（即1.2到1.3）。

有没有办法用正则表达式拆分它，但只捕获增加的序号？我在python中编写代码，为每个代码重复使用自定义re.compile（），这样就可以，但是非常笨拙。

像这样的东西（其中parts1_temp是字符串中x.x.数字的给定列表）：

parts1_temp = ['1.3','1.4','1.2','1.5']
parts_num =  range(int(parts1_temp.split('.')[1]), int(parts1_temp.split('.')[1])+30)
parts_search = ['.'.join([parts1_temp.split('.')[0], str(parts_num_el)]) for parts_num_el in parts_num]
#parts_search should be ['1.3','1.4','1.5',...,'1.32']

for k in range(len(parts_search)-1):
    rxtemp = re.compile(r"(?:"+str(parts_search[k])+")([\s\S]*?)(?=(?:"+str(parts_search[k+1])+"))", re.MULTILINE)
    parts_fin = [match.group(0) for match in rxtemp.finditer(teststring)]

但男人是丑陋的。有没有办法在正则表达式中更直接地执行此操作？我想这是有人在某些方面想要正则表达式的功能，但我找不到任何关于如何解决这个问题的想法（也许纯正的正则表达式是不可能的）。

Answer 1

仅使用正则表达式执行此操作似乎过于复杂。这个处理怎么样：

import re

teststring =  "1.3 Hello how are you 1.4 I am fine, thanks 1.2 Hi There 1.5 Great!" 
res = []
expected = None
for s in re.findall(r'\d+(?:\.\d+)?|\D+', teststring):
    if s[0].isdigit() and expected is None:
        expected = s
        fmt = '{0:.' + str(max(0, len(s) - (s+'.').find('.') - 1)) + 'f}'
        inc = float(re.sub(r'\d', '0', s)[0:-1] + '1')
    if s == expected:
        res.append(s)
        expected = fmt.format(float(s) + inc)
    elif expected:
        res[-1] = res[-1] + s

print (res)

如果数字恰好有2位小数或更多，或者没有。

，这也适用

Answer 2

此方法使用finditer查找\d+\.\d+的所有位置，然后测试匹配是否在数值上大于之前的位置。如果测试为true，则将索引附加到indices数组。

最后一行使用从this answer获取的列表理解来拆分那些给定索引的字符串。

原始方法

此方法可确保先前的匹配小于当前匹配。这不是按顺序工作，而是根据数字大小工作。因此，假设一个字符串具有数字1.1, 1.2, 1.4，它将在每次出现时分割，因为每个数字都大于最后一个数字。

See code in use here

import re

indices = []
string =  "1.3 Hello how are you 1.4 I am fine, thanks 1.2 Hi There 1.5 Great!"
regex = re.compile(r"\d+\.\d+")
lastFloat = 0

for m in regex.finditer(string):
    x = float(m.group())
    if lastFloat < x:
        lastFloat = x
        indices.append(m.start(0))

print([string[i:j] for i,j in zip(indices, indices[1:]+[None])])

输出：['1.3 Hello how are you ', '1.4 I am fine, thanks 1.2 Hi There ', '1.5 Great!']

修改

顺序方法

此方法与原始方法非常相似，但在1.1, 1.2, 1.4的情况下，它不会在1.4上拆分，因为它没有按顺序排列.1 1}}顺序分隔符。

以下方法仅在if语句中有所不同，因此无论您的需求是什么，此逻辑都可以自定义。

See code in use here

import re

indices = []
string =  "1.3 Hello how are you 1.4 I am fine, thanks 1.2 Hi There 1.5 Great!"
regex = re.compile(r"\d+\.\d+")
lastFloat = 0

for m in regex.finditer(string):
    x = float(m.group())
    if (lastFloat == 0) or (x == round(lastFloat + .1, 1)):
        lastFloat = x
        indices.append(m.start(0))

print([string[i:j] for i,j in zip(indices, indices[1:]+[None])])

Answer 3

您还可以改变字符串，以便在数字旁边放置一个标记（如果它是增加序列的一部分）。然后，您可以在该标记处拆分：

import re
teststring =  "1.3 Hello how are you 1.4 I am fine, thanks 1.2 Hi There 1.5 Great!" 
numbers = re.findall('[\.\d]+', teststring)
final_string = re.sub('[\.\d]+', '{}', teststring).format(*[numbers[0]]+[numbers[i] if numbers[i] < numbers[i-1] else '*'+numbers[i] for i in range(1, len(numbers))]).split(' *')

输出：

['1.3 Hello how are you', '1.4 I am fine, thanks 1.2 Hi There', '1.5 Great!']

正则表达式使用增加的数字序列Python

3 个答案:

原始方法

修改

顺序方法