替换长字符串中的字符以将其拆分,以保持分隔符#39;

时间:2014-09-01 08:35:22

标签: python

大家好......感谢帖子Using Python: to split long string, by given ‘separators’,我学会了分割长字符串的方法。

然而,当分割字符串时,'分隔符'会丢失:

import re

text = "C-603WallWizard45256CCCylinders:2HorizontalOpposedBore:1-1/42006Stroke:1-1/8Length: SingleVerticalBore:1-111Height:6Width:K-720Cooling:AirWeight:6LBS1.5H.P.@54500RPMC-60150ccGas2007EngineCylinder:4VerticalInline2008Bore:1Stroke:1Cycle:42007Weight:6-1/2LBSLength:10Width: :AirLength16Cooling:AirLength:5Width:4L-233Height:6Weight: 4TheBlackKnightc-609SteamEngineBore:11/16Stroke:11/162008Length:3Width:3Height:4TheChallengerC-600Bore:1Stroke:1P-305Weight:18LBSLength:12Width:7Height:8C-606Wall15ccGasEngineJ-142Cylinder:SingleVerticalBore:1Stroke:1-1/8Cooling:1Stroke:1-1/4HP:: /4Stroke:1-7/:6Width:6Height:92006Weight:4LBS1.75H.P.@65200RPM"

a = ['2006', '2007', '2008', '2009']

seperators = re.compile(r'|'.join(a))

e = seperators.split(text)

for f in e:
    print f

结果如下:

C-603WallWizard45256CCCylinders:2HorizontalOpposedBore:1-1/4   # '2006' is missing
Stroke:1-1/8Length: SingleVerticalBore:1-111Height:6Width:K-720Cooling:AirWeight:6LBS1.5H.P.@54500RPMC-60150ccGas   # '2007' is missing
EngineCylinder:4VerticalInline   # '2008' is missing
Bore:1Stroke:1Cycle:4   # '2007' is missing
Weight:6-1/2LBSLength:10Width: :AirLength16Cooling:AirLength:5Width:4L-233Height:6Weight: 4TheBlackKnightc-609SteamEngineBore:11/16Stroke:11/16   # '2008' is missing
Length:3Width:3Height:4TheChallengerC-600Bore:1Stroke:1P-305Weight:18LBSLength:12Width:7Height:8C-606Wall15ccGasEngineJ-142Cylinder:SingleVerticalBore:1Stroke:1-1/8Cooling:1Stroke:1-1/4HP:: /4Stroke:1-7/:6Width:6Height:9   # '2006' is missing
Weight:4LBS1.75H.P.@65200RPM   

我希望分开时保留'分隔符'。我试过的一种方法是在每个“分隔符”中添加特殊字符,然后用特殊字符分割长字符串(在下面,'@@@'就是。我知道这不是一个聪明的方法)

a = ['2006', '2007', '2008', '2009']

b = []

for eachone in a:
    b.append(eachone + '@@@')

my_dic = dict(zip(a, b))

for e, f in my_dic.iteritems():
    new_text = ''.join(text.replace(e, f))

但是原始字符串中没有替换某些字符。为什么呢?

另一方面,我的方法是分开长串并保留'分隔符'是不必要的吗? (我已经检查了其他帖子,但在我有限的理解中,我找不到答案)

感谢。

2 个答案:

答案 0 :(得分:1)

在正则表达式中使用捕获组:

seperators = re.compile(r'(' + r'|'.join(a) + r')')

这样,将保留分隔符。

答案 1 :(得分:1)

如果在正则表达式中使用捕获组,您将获得所需的结果:

seperators = re.compile(r'(%s)' % '|'.join(a))

输出

C-603WallWizard45256CCCylinders:2HorizontalOpposedBore:1-1/4
2006
Stroke:1-1/8Length: SingleVerticalBore:1-111Height:6Width:K-720Cooling:AirWeight:6LBS1.5H.P.@54500RPMC-60150ccGas
2007
EngineCylinder:4VerticalInline
2008
Bore:1Stroke:1Cycle:4
2007
Weight:6-1/2LBSLength:10Width: :AirLength16Cooling:AirLength:5Width:4L-233Height:6Weight: 4TheBlackKnightc-609SteamEngineBore:11/16Stroke:11/16
2008
Length:3Width:3Height:4TheChallengerC-600Bore:1Stroke:1P-305Weight:18LBSLength:12Width:7Height:8C-606Wall15ccGasEngineJ-142Cylinder:SingleVerticalBore:1Stroke:1-1/8Cooling:1Stroke:1-1/4HP:: /4Stroke:1-7/:6Width:6Height:9
2006
Weight:4LBS1.75H.P.@65200RPM

如果您想将分隔符保留在上一个字符串的末尾,而不是分割,而是查找

seperators = re.compile(r'.*?(?:%s|$)' % '|'.join(a))
e = seperators.findall(text)

输出

C-603WallWizard45256CCCylinders:2HorizontalOpposedBore:1-1/42006
Stroke:1-1/8Length: SingleVerticalBore:1-111Height:6Width:K-720Cooling:AirWeight:6LBS1.5H.P.@54500RPMC-60150ccGas2007
EngineCylinder:4VerticalInline2008
Bore:1Stroke:1Cycle:42007
Weight:6-1/2LBSLength:10Width: :AirLength16Cooling:AirLength:5Width:4L-233Height:6Weight: 4TheBlackKnightc-609SteamEngineBore:11/16Stroke:11/162008
Length:3Width:3Height:4TheChallengerC-600Bore:1Stroke:1P-305Weight:18LBSLength:12Width:7Height:8C-606Wall15ccGasEngineJ-142Cylinder:SingleVerticalBore:1Stroke:1-1/8Cooling:1Stroke:1-1/4HP:: /4Stroke:1-7/:6Width:6Height:92006
Weight:4LBS1.75H.P.@65200RPM