Python基础知识 - 切割一个长字符串并将切片组合成所需的片段

时间:2014-08-26 07:11:33

标签: python list

环境:赢7; Python 2.76

大家好......我需要从字符串中获取一些文本,如下所示:

“C-603WallWizard45256CCCylinders:2Horizo​​ntalOpposedBore:1-1 / 4Stroke:1-1 / 8Length:SingleVerticalBore:1-111Height:6Width:K-720Cooling:AirWeight:6LBS1.5H.P。@ 54500RPMC-60150ccGasEngineCylinder:4VerticalInlineBore:1Stroke :1Cycle:4Weight:6-1 / 2LBSLength:10Width :: AirLength16Cooling:AirLength:5Width:4L-233Height:6Weight:4TheBlackKnightc-609SteamEngineBore:11/16Stroke:11 / 16Length:3Width:3Height:4TheChallengerC-600Bore:1Stroke:1P- 305重量:18LBSLength:12Width:7Height:8C-606Wall15ccGasEngineJ-142气缸:SingleVerticalBore:1冲程:1-1 / 8冷却:1冲程:1-1 / 4HP :: / 4冲程:1-7 /:6Width:6高度:9重量:4LBS1.75H .P。@ 65200RPM”

想要的是:

予。 1个字母+ 3个数字的组合,由' - '联合。如:C-603,K-720,C-606等

II。 5个连续数字的组合。如:45256,54500,60150,65200等

我的想法是:

  1. 将字符串切成每个部分,例如'C',' - ','6','0','3',......'R','P','M'
  2. 将它们组合成4位数字和5位数字,如'C-60',' - 603','603W'......和'C-603W',' - 603W','603Wa'
  3. 选择符合标准I和II的那些
  4. 听起来像一种方式?如果是,我可以在流程中使用哪些命令? 感谢。

1 个答案:

答案 0 :(得分:1)

使用regular expressions是一种方法:

>>> data = '''C-603WallWizard45256CCCylinders:2HorizontalOpposedBore:1-1/4Stroke:1-1/8Length: SingleVerticalBore:1-111Height:6Width:K-720Cooling:AirWeight:6LBS1.5H.P.@54500RPMC-60150ccGasEngineCylinder:4VerticalInlineBore:1Stroke:1Cycle:4Weight:6-1/2LBSLength:10Width: :AirLength16Cooling:AirLength:5Width:4L-233Height:6Weight: 4TheBlackKnightc-609SteamEngineBore:11/16Stroke:11/16Length:3Width:3Height:4TheChallengerC-600Bore:1Stroke:1P-305Weight:18LBSLength:12Width:7Height:8C-606Wall15ccGasEngineJ-142Cylinder:SingleVerticalBore:1Stroke:1-1/8Cooling:1Stroke:1-1/4HP:: /4Stroke:1-7/:6Width:6Height:9Weight:4LBS1.75H.P.@65200RPM'''

>>> one_letter_three_numbers = re.compile(r'.\-\d{3}', re.IGNORECASE)
>>> re.findall(one_letter_three_numbers, data)
['C-603', '1-111', 'K-720', 'C-601', 'L-233', 'c-609', 'C-600', 'P-305', 'C-606', 'J-142']

>>> five_continuous = re.compile(r'\d{5}', re.IGNORECASE)
>>> re.findall(five_continuous, data)
['45256', '54500', '60150', '65200']