对于具有正则表达式的循环,其行为不符合要求

时间:2014-08-25 12:32:39

标签: python regex

我的代码中有一个名为'match3g'的字符串值,它以下列格式打印数据:

10272,201,Halifax,1,3333,1,33,2,33
,10272,989,Forest Green,2,3331,3,33,9,31
,10272,203,Lincoln,3,1333,6,13,4,33
,10272,206,Barnet,4,3303,11,30,1,33
,10272,199,Wrexham,5,3033,15,03,3,33
,10272,749,Kidderminster,6,1331,2,33,13,11
,10272,205,Macclesfield,7,3311,8,31,8,31
,10272,6106,Eastleigh,8,3310,7,31,10,30
,10272,1392,Aldershot,9,3031,5,31,12,03
,10272,921,Gateshead,10,3310,16,30,6,31
,10272,164,Grimsby,11,1113,18,11,5,13
,10272,991,Woking,12,3111,19,11,7,31
,10272,204,Torquay,13,0311,4,31,17,01
,10272,919,Southport,14,0013,14,03,18,01
,10272,185,Bristol Rovers,15,1003,9,13,22,00
,10272,909,Dover,16,0013,13,03,19,01
,10272,3551,Braintree Town,17,0300,10,30,20,00
,10272,1389,Altrincham,18,0030,12,03,21,00
,10272,213,Chester,19,0030,24,00,11,03
,10272,6140,Dartford,20,0101,20,01,15,10
,10272,1395,Welling,21,1001,17,11,24,00
,10272,982,Telford,22,1000,22,00,14,10
,10272,913,Nuneaton,23,0100,23,00,16,10
,10272,2792,Alfreton,24,0000,21,00,23,00

我想用逗号分隔每行第4个逗号之后的数字字符串。我有一些代码可以实现这一点,但只解析最后一行:

regex2 = re.compile(r'\d+(?=(?:,[^,]+){4}$)',re.S)
regexer2 = re.search(regex2, match3g)
regexer2 = regexer2.group()
regexer3 = ','.join(list(regexer2))
regexs2 = str(regexer3)
print "Test = ", regexs2.decode()

打印结果:

Test = 0,0,0,0

但是我希望'match3g'中的每一行数据都采用相同的步骤。因此,我尝试使用'For'循环如下:

for line in match3g:
                regex2 = re.compile(r'\d+(?=(?:,[^,]+){4}$)',re.S)
                regexer2 = re.search(regex2, match3g)
                if regexer2 is not None:
                    regexer2 = regexer2.group()
                    regexer3 = ','.join(list(regexer2))
                    regexs2 = str(regexer3)
                    print "Test = ", regexs2.decode()

而不是给我想要的结果,我得到这样的打印:

Test - 0,0,0,0
Test - 0,0,0,0
Test - 0,0,0,0
Test - 0,0,0,0
....lots of lines of this
Test - 0,0,0,0
Test - 0,0,0,0
Test - 0,0,0,0
Test - 0,0,0,0

我的预期输出是:

Test = 3,3,3,3
Test = 3,3,3,1
Test = 1,3,3,3
Test = 3,3,0,3
Test = 3,0,3,3
Test = 1,3,3,1
Test = 3,3,1,1
Test = 3,3,1,0
Test = 3,0,3,1
Test = 3,3,1,0
Test = 1,1,1,3
Test = 3,1,1,1
Test = 0,3,1,1
Test = 0,0,1,3
Test = 1,0,0,3
Test = 0,0,1,3
Test = 0,3,0,0
Test = 0,0,3,0
Test = 0,0,3,0
Test = 0,1,0,1
Test = 1,0,0,1
Test = 1,0,0,0
Test = 0,1,0,0
Test = 0,0,0,0

任何人都可以看到我出错的地方吗?它看起来好像我几乎就在那里。

由于

3 个答案:

答案 0 :(得分:1)

for line in s.split("\n"):
    spl = line.rsplit(",",5)[-5:-4]
    if spl:
        print "Test = {}".format(",".join(list(spl[0])))
Test = 3,3,3,3
Test = 3,3,3,1
Test = 1,3,3,3
Test = 3,3,0,3
Test = 3,0,3,3
Test = 1,3,3,1
Test = 3,3,1,1
Test = 3,3,1,0
Test = 3,0,3,1
Test = 3,3,1,0
Test = 1,1,1,3
Test = 3,1,1,1
Test = 0,3,1,1
Test = 0,0,1,3
Test = 1,0,0,3
Test = 0,0,1,3
Test = 0,3,0,0
Test = 0,0,3,0
Test = 0,0,3,0
Test = 0,1,0,1
Test = 1,0,0,1
Test = 1,0,0,0
Test = 0,1,0,0
Test = 0,0,0,0

答案 1 :(得分:0)

我想你可能想要这个,如果你想要0,0,0,0而不是0000。

#!/usr/bin/python
#-*- coding:utf-8 -*-

import re

match3g = '''10272,201,Halifax,1,3333,1,33,2,33
,10272,989,Forest Green,2,3331,3,33,9,31
,10272,203,Lincoln,3,1333,6,13,4,33
,10272,206,Barnet,4,3303,11,30,1,33
,10272,199,Wrexham,5,3033,15,03,3,33
,10272,749,Kidderminster,6,1331,2,33,13,11
,10272,205,Macclesfield,7,3311,8,31,8,31
,10272,6106,Eastleigh,8,3310,7,31,10,30
,10272,1392,Aldershot,9,3031,5,31,12,03
,10272,921,Gateshead,10,3310,16,30,6,31
,10272,164,Grimsby,11,1113,18,11,5,13
,10272,991,Woking,12,3111,19,11,7,31
,10272,204,Torquay,13,0311,4,31,17,01
,10272,919,Southport,14,0013,14,03,18,01
,10272,185,Bristol Rovers,15,1003,9,13,22,00
,10272,909,Dover,16,0013,13,03,19,01
,10272,3551,Braintree Town,17,0300,10,30,20,00
,10272,1389,Altrincham,18,0030,12,03,21,00
,10272,213,Chester,19,0030,24,00,11,03
,10272,6140,Dartford,20,0101,20,01,15,10
,10272,1395,Welling,21,1001,17,11,24,00
,10272,982,Telford,22,1000,22,00,14,10
,10272,913,Nuneaton,23,0100,23,00,16,10
,10272,2792,Alfreton,24,0000,21,00,23,00'''


for line in match3g.split('\n'):
    a = line.split(',')
    if a[0]=='':
        print list(a[5])
    else:
        print list(a[4])

答案 2 :(得分:0)

用OP评论后重写:

我对Python不太满意,很抱歉如果我在python部分错了:

#!/usr/bin/python
#-*- coding:utf-8 -*-

import re

match3g = '''10272,201,Halifax,1,3333,1,33,2,33
,10272,989,Forest Green,2,3331,3,33,9,31
,10272,203,Lincoln,3,1333,6,13,4,33
,10272,206,Barnet,4,3303,11,30,1,33
,10272,199,Wrexham,5,3033,15,03,3,33
,10272,749,Kidderminster,6,1331,2,33,13,11
,10272,205,Macclesfield,7,3311,8,31,8,31
,10272,6106,Eastleigh,8,3310,7,31,10,30
,10272,1392,Aldershot,9,3031,5,31,12,03
,10272,921,Gateshead,10,3310,16,30,6,31
,10272,164,Grimsby,11,1113,18,11,5,13
,10272,991,Woking,12,3111,19,11,7,31
,10272,204,Torquay,13,0311,4,31,17,01
,10272,919,Southport,14,0013,14,03,18,01
,10272,185,Bristol Rovers,15,1003,9,13,22,00
,10272,909,Dover,16,0013,13,03,19,01
,10272,3551,Braintree Town,17,0300,10,30,20,00
,10272,1389,Altrincham,18,0030,12,03,21,00
,10272,213,Chester,19,0030,24,00,11,03
,10272,6140,Dartford,20,0101,20,01,15,10
,10272,1395,Welling,21,1001,17,11,24,00
,10272,982,Telford,22,1000,22,00,14,10
,10272,913,Nuneaton,23,0100,23,00,16,10
,10272,2792,Alfreton,24,0000,21,00,23,00'''


regex2 = re.compile(r'^(?:,|)(?:.*?,){4}(\d+),.*$',re.M)
for (line) in match3g.split('\n'):
  print line
  match = re.search(regex2, line)
  print list(match.group(1))