我正在努力使用python正则表达式来获得正确的输出。
我的文件包含这样的字符串:
80H236M7I106M2885H
旁边的整数字符串可以包含:IDMSH
我正在尝试训练我的字符串中的某些部分的开始结束(S和H的部分。这些部分总是在字符串的开头,结尾或两端),我的示例字符串的正确输出将是:
0 80 80H236M7I106M2885H
429 3314 80H236M7I106M2885H
(基本上我们将所有数字相加,直到我们遇到另一个块,我们正在搜索,然后我们设置新的开始,结束它) (btw也很高兴知道该部分是在字符串的开头还是结尾)例如:
0 80 80H236M7I106M2885H start
429 3314 80H236M7I106M2885H end
我使用了这样的代码:
insstart = 0
insend = 0
for num1, i_or_d in re.findall('(\d+)([HISDM])', pcigar):
if i_or_d in 'S':
insstart == insstart
insend += int(num1)
elif i_or_d in 'H':
insstart == insstart
insend += int(num1)
elif i_or_d in 'M':
insstart += int(num1)
insend += int(num1)
elif i_or_d in 'I':
insstart += int(num1)
insend += int(num1)
if i_or_d in 'H' or i_or_d in 'S':
print insstart,insend,pcigar
然而它输出:
0 80 80H236M7I106M2885H
349 3314 80H236M7I106M2885H
任何人都可以帮助我获得正确的输出吗? 干杯, Irek
答案 0 :(得分:1)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re
pcigar = "80H236M7I106M2885H"
insstart = 0
insend = 0
temp = 0
for num1, i_or_d in re.findall('(\d+)([HISDM])', pcigar):
if i_or_d in 'S':
insstart = insstart + temp
insend += int(num1)
temp += insend
elif i_or_d in 'H':
insstart = insstart + temp
insend += int(num1)
temp += insend
elif i_or_d in 'M':
insstart += int(num1)
insend += int(num1)
elif i_or_d in 'I':
insstart += int(num1)
insend += int(num1)
if i_or_d in 'H' or i_or_d in 'S':
print insstart, insend, pcigar