你如何在python中将文本放入数组中

时间:2015-02-17 15:48:01

标签: python regex

我有一个这样的变量:

metricName = '(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage)|(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount)|(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used)|(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count)|(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\))|(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)'

我需要创建一个for循环,并且一次只能使用一个metricName。例如,1st (WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage) then (WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount) then (GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used)等等。 Delimeter是|但不是这个\ |

我尝试创建一个数组:

data[]

data.append(metricName.split('|'))

但它给了我这样的数组:

[['(WebSpherePMI\\', 'jvmRuntimeModule:ProcessCpuUsage)', '(WebSpherePMI\\', 'threadPoolModule\\', 'WebContainer:ActiveCount)', '(GC Monitor\\', 'Memory Pools\\', 'Java heap:Percentage of Maximum Capacity Currently Used)', '(GC Monitor\\', 'Garbage Collectors\\', '(.*):GC Invocations Per Interval Count)', '(GC Monitor\\', 'Garbage Collectors\\', '(.*):GC Time Per Interval \\(ms\\))', '(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)']]

我有什么想法可以把它放在一个数组中吗?

4 个答案:

答案 0 :(得分:10)

您可以使用正则表达式分割字符串:

>>> import re
>>> re.split(r'(?<=\))\|(?=\()',metricName)
['(WebSpherePMI\\|jvmRuntimeModule:ProcessCpuUsage)', '(WebSpherePMI\\|threadPoolModule\\|WebContainer:ActiveCount)', '(GC Monitor\\|Memory Pools\\|Java heap:Percentage of Maximum Capacity Currently Used)', '(GC Monitor\\|Garbage Collectors\\|(.*):GC Invocations Per Interval Count)', '(GC Monitor\\|Garbage Collectors\\|(.*):GC Time Per Interval \\(ms\\))', '(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)']

在这种情况下,r'(?<=\))\|(?=\()会根据)(之间的点数符号拆分字符串。它使用positive look-around进行匹配!

答案 1 :(得分:1)

你不能做一个天真的str.split,因为你正在寻找上下文敏感的分裂:即

  

在括号中未包含的任何竖线上拆分

你可能应该使用正则表达式,但我的正则表达式让我失望,所以让我们做一些不可思议的事情。

stack = 0
tokens = []
last_start = 0
for i in range(len(s)): # iterate through indexes of string s
    if s[i] == "(":
        stack += 1
    if s[i] == ")":
        stack = max(0, stack-1)
        # this will prevent breaking nested parentheses if you have
        # ugly parenthetical text like "A) this, B) that."
    if s[i] == "|" and stack == 0:
        tokens.append(s[last_start:i])
        last_start = i+1

那就是说,如果您的括号竖条的单个案例之前有一个重击(如您的示例中所示),您可以这样做:

re.split(r"(?<!\\)\|", s)

答案 2 :(得分:0)

您不想追加到现有的空列表,只想创建一个列表。所以:

data = metricName.split('|')

答案 3 :(得分:0)

  

Delimeter是|但不是这个\ |

根据你所说的,你想要一个负面的背后断言。

试试这个:

import re
metricName = '(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage)|(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount)|(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used)|(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count)|(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\))|(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)'
data = re.split(r"(?<!\\)\|", metricName)

返回

[(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage),
(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount),
(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used),
(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count),
(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\)),
(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)]

这里有更多关于python中的正则表达式函数,特别是负面的lookbehind断言:

(?<!...)

https://docs.python.org/2/library/re.html

如果确实你只想要|当它介于)和(然后上面的答案是最好的。