以随机顺序显示的RegEx匹配(命名)组(Python重新)

时间:2017-08-05 23:24:40

标签: python regex regex-group

我正在尝试匹配RegEx命名组( preArgs apm1Args midArgs apm2Args , postArgs ),以随机顺序出现 我可以匹配测试字符串1 ,但不能匹配下面的测试字符串2

我需要满足以下要求:

1。 每个组可能存在1个或更多(因为剩余的垃圾);或者缺席完全......

2。 除了唯一的javaagent jar之外, apm1Args apm2args 中的每一个始终都会显示一个或多个-D开关。

我尝试了一些OR(|)选项,(?=)积极向前看,但没有运气而迷失在迷宫中...... 我的试验:

RegEx (可从RegEx listed at regex101.com获得)

^(?P<preArgs>.*)(?P<apm1Args>-javaagent:.+\/agent1\.jar\s+(?:-Dvendor1\.agent1\.\S+\s*)*)(?P<midArgs>.*)(?P<apm2Args>-javaagent:.+\/agent2\.jar\s+(?:-Dvendor2\.agent2\.\S+\s*)*)(?P<postArgs>.*)$

测试字符串1

-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 -javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName=myNode1 -Dvendor1.agent1.uniqueHostId=myHost1 -Xgcpolicy:gencon -javaagent:/path2/to/vendor2/agent2.jar -Dvendor2.agent2.agentProfile=/path2/to/profiles/agent2.profile -Dvendor2.agent2.customValue1=myValue2

测试字符串2 (可从以下网址获取:same RegEx with a different regex101.com link

-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 -javaagent:/path2/to/vendor2/agent2.jar -Dvendor2.agent2.agentProfile=/path2/to/profiles/agent2.profile -Dvendor2.agent2.customValue1=myValue2 -javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName=myNode1 -Dvendor1.agent1.uniqueHostId=myHost1 -Xgcpolicy:gencon

更新

我最终在python中使用'循环'方法来清理以随机顺序显示或根本不显示的'apmArgs'组。以下是我的代码段(也可在repl.it进行测试)

import os, sys, re

regExArr=[
  '(?P<preArgs>.*)(?P<apmArgs>-javaagent:\s*\/\S+agent1\.jar\s+(?:-Dvendor1\.agent1\.\S+\s*)*)(?P<postArgs>.*)'
,'(?P<preArgs>.*)(?P<apmArgs>-javaagent:\s*\/\S+agent2\.jar\s+(?:-Dvendor2\.agent2\.\S+\s*)*)(?P<postArgs>.*)'
]

testStrList=[
  '-javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName=myNode1 -Dvendor1.agent1.uniqueHostId=myHost1 -javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName=myNode1 -Dvendor1.agent1.uniqueHostId=myHost1 -Xgcpolicy:gencon -javaagent:/path2/to/vendor2/agent2.jar -Dvendor2.agent2.agentProfile=/path2/to/profiles/agent2.profile -Dvendor2.agent2.customValue1=myValue2'
,'-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 -javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName=myNode1 -Dvendor1.agent1.uniqueHostId=myHost1'
,'-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 -javaagent:/path2/to/vendor2/agent2.jar -Dvendor2.agent2.agentProfile=/metlife/runtime/installed/apm/profiles/csa.profile -Dvendor2.agent2.customValue1=myValue2 -javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName= -Dvendor1.agent1.uniqueHostId=myHost1 -Xgcpolicy:gencon'
,'-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 -Xgcpolicy:gencon'
]

newApmArgs='-javaagent:/path3/to/agent3.jar -Dvendor3.agent3.applicationName=app1234 -Dvendor3.agent3.tierName=myTier13 -Dvendor3.agent3.nodeName=myNode13 -Dvendor3.agent3.uniqueHostId=myHost13'

for i, testStr in enumerate(testStrList):

    for regEx in regExArr:

        matchedArgs = re.search(regEx,testStr)

        while matchedArgs:

          print "matchedArgs found count:", len(matchedArgs.groups())
          print "matchedArgs found:\n", matchedArgs.groups()
          #ignore any <apmArgs> group and concatenate other groups
          testStr=(matchedArgs.group('preArgs').strip()+' '+matchedArgs.group('postArgs').strip()).strip()
          #check further for leftover <apmArgs> and repeat the clean-up
          matchedArgs = re.search(regEx,testStr)

    testStrList[i] = testStr + ' ' + newApmArgs

print "cleaned up list testStrList that had Random groups of APM Args Text (now appended with 3rd type APM Args) is:\n", testStrList

输出:

matchedArgs found count: 3
matchedArgs found:
('-javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName=myNode1 -Dvendor1.agent1.uniqueHostId=myHost1 ', '-javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName=myNode1 -Dvendor1.agent1.uniqueHostId=myHost1 ', '-Xgcpolicy:gencon -javaagent:/path2/to/vendor2/agent2.jar -Dvendor2.agent2.agentProfile=/path2/to/profiles/agent2.profile -Dvendor2.agent2.customValue1=myValue2')
matchedArgs found count: 3
matchedArgs found:
('', '-javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName=myNode1 -Dvendor1.agent1.uniqueHostId=myHost1 ', '-Xgcpolicy:gencon -javaagent:/path2/to/vendor2/agent2.jar -Dvendor2.agent2.agentProfile=/path2/to/profiles/agent2.profile -Dvendor2.agent2.customValue1=myValue2')
matchedArgs found count: 3
matchedArgs found:
('-Xgcpolicy:gencon ', '-javaagent:/path2/to/vendor2/agent2.jar -Dvendor2.agent2.agentProfile=/path2/to/profiles/agent2.profile -Dvendor2.agent2.customValue1=myValue2', '')
matchedArgs found count: 3
matchedArgs found:
('-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 ', '-javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName=myNode1 -Dvendor1.agent1.uniqueHostId=myHost1', '')
matchedArgs found count: 3
matchedArgs found:
('-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 -javaagent:/path2/to/vendor2/agent2.jar -Dvendor2.agent2.agentProfile=/metlife/runtime/installed/apm/profiles/csa.profile -Dvendor2.agent2.customValue1=myValue2 ', '-javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName= -Dvendor1.agent1.uniqueHostId=myHost1 ', '-Xgcpolicy:gencon')
matchedArgs found count: 3
matchedArgs found:
('-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 ', '-javaagent:/path2/to/vendor2/agent2.jar -Dvendor2.agent2.agentProfile=/metlife/runtime/installed/apm/profiles/csa.profile -Dvendor2.agent2.customValue1=myValue2 ', '-Xgcpolicy:gencon')
cleaned up list testStrList that had Random groups of APM Args Text (now appended with 3rd type APM Args) is:
['-Xgcpolicy:gencon -javaagent:/path3/to/agent3.jar -Dvendor3.agent3.applicationName=app1234 -Dvendor3.agent3.tierName=myTier13 -Dvendor3.agent3.nodeName=myNode13 -Dvendor3.agent3.uniqueHostId=myHost13', '-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 -javaagent:/path3/to/agent3.jar -Dvendor3.agent3.applicationName=app1234 -Dvendor3.agent3.tierName=myTier13 -Dvendor3.agent3.nodeName=myNode13 -Dvendor3.agent3.uniqueHostId=myHost13', '-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 -Xgcpolicy:gencon -javaagent:/path3/to/agent3.jar -Dvendor3.agent3.applicationName=app1234 -Dvendor3.agent3.tierName=myTier13 -Dvendor3.agent3.nodeName=myNode13 -Dvendor3.agent3.uniqueHostId=myHost13', '-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 -Xgcpolicy:gencon -javaagent:/path3/to/agent3.jar -Dvendor3.agent3.applicationName=app1234 -Dvendor3.agent3.tierName=myTier13 -Dvendor3.agent3.nodeName=myNode13 -Dvendor3.agent3.uniqueHostId=myHost13']

1 个答案:

答案 0 :(得分:0)

你可能会发现一种pyparsing方法可以让你更快地进行正则表达式争论。这是一个将处理两个测试字符串的解析器:

import pyparsing as pp

# just some punctuation
COLON,EQ = map(pp.Suppress, ':=')

# expressions for key=value,... switches
subkey = pp.Word(pp.alphas)
subvalue = pp.pyparsing_common.integer | pp.Word(pp.printables, excludeChars=',')
key_value_list = pp.Dict(pp.delimitedList(pp.Group(subkey + EQ + subvalue)))

# parse switches
switch_key = pp.Word('-', pp.alphas).setParseAction(lambda t: t[0][1:].lower())
switch_value = key_value_list | subvalue
switch = switch_key + pp.Optional(COLON + switch_value)

# -D definitions
java_path_name = pp.delimitedList(pp.pyparsing_common.identifier, delim='.', combine=True)
defn = (pp.Suppress("-D") +  java_path_name.leaveWhitespace()
        + EQ.leaveWhitespace() 
        + pp.Optional(subvalue().leaveWhitespace()))

# define parser for the entire line - use Dict class to define dynamic key-value structures instead of just 2-tuples
parser = pp.Dict(pp.OneOrMore(pp.Group(defn | switch)))

tests = """\
-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 -javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName= -Dvendor1.agent1.uniqueHostId=myHost1 -Xgcpolicy:gencon -javaagent:/path2/to/vendor2/agent2.jar -Dvendor2.agent2.agentProfile=/metlife/runtime/installed/apm/profiles/csa.profile -Dvendor2.agent2.customValue1=myValue2
-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 -javaagent:/path2/to/vendor2/agent2.jar -Dvendor2.agent2.agentProfile=/metlife/runtime/installed/apm/profiles/csa.profile -Dvendor2.agent2.customValue1=myValue2 -javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName= -Dvendor1.agent1.uniqueHostId=myHost1 -Xgcpolicy:gencon
"""
parser.runTests(tests)

打印:

-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 -javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName= -Dvendor1.agent1.uniqueHostId=myHost1 -Xgcpolicy:gencon -javaagent:/path2/to/vendor2/agent2.jar -Dvendor2.agent2.agentProfile=/metlife/runtime/installed/apm/profiles/csa.profile -Dvendor2.agent2.customValue1=myValue2
[['xdebug'], ['xnoagent'], ['xrunjdwp', ['transport', 'dt_socket'], ['server', 'y'], ['suspend', 'y'], ['address', 7777]], ['javaagent', '/path1/to/agent1.jar'], ['vendor1.agent1.applicationName', 'app123'], ['vendor1.agent1.tierName', 'myTier1'], ['vendor1.agent1.nodeName'], ['vendor1.agent1.uniqueHostId', 'myHost1'], ['xgcpolicy', 'gencon'], ['javaagent', '/path2/to/vendor2/agent2.jar'], ['vendor2.agent2.agentProfile', '/metlife/runtime/installed/apm/profiles/csa.profile'], ['vendor2.agent2.customValue1', 'myValue2']]
- javaagent: '/path2/to/vendor2/agent2.jar'
- vendor1.agent1.applicationName: 'app123'
- vendor1.agent1.nodeName: ''
- vendor1.agent1.tierName: 'myTier1'
- vendor1.agent1.uniqueHostId: 'myHost1'
- vendor2.agent2.agentProfile: '/metlife/runtime/installed/apm/profiles/csa.profile'
- vendor2.agent2.customValue1: 'myValue2'
- xdebug: ''
- xgcpolicy: 'gencon'
- xnoagent: ''
- xrunjdwp: [['transport', 'dt_socket'], ['server', 'y'], ['suspend', 'y'], ['address', 7777]]
  - address: 7777
  - server: 'y'
  - suspend: 'y'
  - transport: 'dt_socket'


-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=7777 -javaagent:/path2/to/vendor2/agent2.jar -Dvendor2.agent2.agentProfile=/metlife/runtime/installed/apm/profiles/csa.profile -Dvendor2.agent2.customValue1=myValue2 -javaagent:/path1/to/agent1.jar -Dvendor1.agent1.applicationName=app123 -Dvendor1.agent1.tierName=myTier1 -Dvendor1.agent1.nodeName= -Dvendor1.agent1.uniqueHostId=myHost1 -Xgcpolicy:gencon
[['xdebug'], ['xnoagent'], ['xrunjdwp', ['transport', 'dt_socket'], ['server', 'y'], ['suspend', 'y'], ['address', 7777]], ['javaagent', '/path2/to/vendor2/agent2.jar'], ['vendor2.agent2.agentProfile', '/metlife/runtime/installed/apm/profiles/csa.profile'], ['vendor2.agent2.customValue1', 'myValue2'], ['javaagent', '/path1/to/agent1.jar'], ['vendor1.agent1.applicationName', 'app123'], ['vendor1.agent1.tierName', 'myTier1'], ['vendor1.agent1.nodeName'], ['vendor1.agent1.uniqueHostId', 'myHost1'], ['xgcpolicy', 'gencon']]
- javaagent: '/path1/to/agent1.jar'
- vendor1.agent1.applicationName: 'app123'
- vendor1.agent1.nodeName: ''
- vendor1.agent1.tierName: 'myTier1'
- vendor1.agent1.uniqueHostId: 'myHost1'
- vendor2.agent2.agentProfile: '/metlife/runtime/installed/apm/profiles/csa.profile'
- vendor2.agent2.customValue1: 'myValue2'
- xdebug: ''
- xgcpolicy: 'gencon'
- xnoagent: ''
- xrunjdwp: [['transport', 'dt_socket'], ['server', 'y'], ['suspend', 'y'], ['address', 7777]]
  - address: 7777
  - server: 'y'
  - suspend: 'y'
  - transport: 'dt_socket'

以下是访问已解析字段的示例代码:

t0 = tests.splitlines()[0]
result = parser.parseString(t0)
print(result.xrunjdwp.address)
print(result['vendor1.agent1.applicationName'])

打印:

7777
app123