我有一个这样的变量:
metricName = '(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage)|(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount)|(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used)|(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count)|(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\))|(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)'
我需要创建一个for循环,并且一次只能使用一个metricName。例如,1st (WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage) then (WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount) then (GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used)
等等。 Delimeter是|但不是这个\ |
我尝试创建一个数组:
data[]
data.append(metricName.split('|'))
但它给了我这样的数组:
[['(WebSpherePMI\\', 'jvmRuntimeModule:ProcessCpuUsage)', '(WebSpherePMI\\', 'threadPoolModule\\', 'WebContainer:ActiveCount)', '(GC Monitor\\', 'Memory Pools\\', 'Java heap:Percentage of Maximum Capacity Currently Used)', '(GC Monitor\\', 'Garbage Collectors\\', '(.*):GC Invocations Per Interval Count)', '(GC Monitor\\', 'Garbage Collectors\\', '(.*):GC Time Per Interval \\(ms\\))', '(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)']]
我有什么想法可以把它放在一个数组中吗?
答案 0 :(得分:10)
您可以使用正则表达式分割字符串:
>>> import re
>>> re.split(r'(?<=\))\|(?=\()',metricName)
['(WebSpherePMI\\|jvmRuntimeModule:ProcessCpuUsage)', '(WebSpherePMI\\|threadPoolModule\\|WebContainer:ActiveCount)', '(GC Monitor\\|Memory Pools\\|Java heap:Percentage of Maximum Capacity Currently Used)', '(GC Monitor\\|Garbage Collectors\\|(.*):GC Invocations Per Interval Count)', '(GC Monitor\\|Garbage Collectors\\|(.*):GC Time Per Interval \\(ms\\))', '(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)']
在这种情况下,r'(?<=\))\|(?=\()
会根据)
和(
之间的点数符号拆分字符串。它使用positive look-around进行匹配!
答案 1 :(得分:1)
你不能做一个天真的str.split
,因为你正在寻找上下文敏感的分裂:即
在括号中未包含的任何竖线上拆分
你可能应该使用正则表达式,但我的正则表达式让我失望,所以让我们做一些不可思议的事情。
stack = 0
tokens = []
last_start = 0
for i in range(len(s)): # iterate through indexes of string s
if s[i] == "(":
stack += 1
if s[i] == ")":
stack = max(0, stack-1)
# this will prevent breaking nested parentheses if you have
# ugly parenthetical text like "A) this, B) that."
if s[i] == "|" and stack == 0:
tokens.append(s[last_start:i])
last_start = i+1
那就是说,如果您的括号竖条的单个案例之前有一个重击(如您的示例中所示),您可以这样做:
re.split(r"(?<!\\)\|", s)
答案 2 :(得分:0)
您不想追加到现有的空列表,只想创建一个列表。所以:
data = metricName.split('|')
答案 3 :(得分:0)
Delimeter是|但不是这个\ |
根据你所说的,你想要一个负面的背后断言。
试试这个:
import re
metricName = '(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage)|(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount)|(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used)|(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count)|(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\))|(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)'
data = re.split(r"(?<!\\)\|", metricName)
返回
[(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage),
(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount),
(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used),
(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count),
(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\)),
(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)]
这里有更多关于python中的正则表达式函数,特别是负面的lookbehind断言:
(?<!...)
https://docs.python.org/2/library/re.html
如果确实你只想要|当它介于)和(然后上面的答案是最好的。