Question

我有一个这样的变量：

metricName = '(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage)|(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount)|(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used)|(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count)|(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\))|(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)'

我尝试创建一个数组：

data[]

data.append(metricName.split('|'))

但它给了我这样的数组：

[['(WebSpherePMI\\', 'jvmRuntimeModule:ProcessCpuUsage)', '(WebSpherePMI\\', 'threadPoolModule\\', 'WebContainer:ActiveCount)', '(GC Monitor\\', 'Memory Pools\\', 'Java heap:Percentage of Maximum Capacity Currently Used)', '(GC Monitor\\', 'Garbage Collectors\\', '(.*):GC Invocations Per Interval Count)', '(GC Monitor\\', 'Garbage Collectors\\', '(.*):GC Time Per Interval \\(ms\\))', '(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)']]

我有什么想法可以把它放在一个数组中吗？

Answer 1

您可以使用正则表达式分割字符串：

>>> import re
>>> re.split(r'(?<=\))\|(?=\()',metricName)
['(WebSpherePMI\\|jvmRuntimeModule:ProcessCpuUsage)', '(WebSpherePMI\\|threadPoolModule\\|WebContainer:ActiveCount)', '(GC Monitor\\|Memory Pools\\|Java heap:Percentage of Maximum Capacity Currently Used)', '(GC Monitor\\|Garbage Collectors\\|(.*):GC Invocations Per Interval Count)', '(GC Monitor\\|Garbage Collectors\\|(.*):GC Time Per Interval \\(ms\\))', '(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)']

在这种情况下，r'(?<=\))\|(?=\()会根据)和(之间的点数符号拆分字符串。它使用positive look-around进行匹配！

Answer 2

你不能做一个天真的str.split，因为你正在寻找上下文敏感的分裂：即

在括号中未包含的任何竖线上拆分

你可能应该使用正则表达式，但我的正则表达式让我失望，所以让我们做一些不可思议的事情。

stack = 0
tokens = []
last_start = 0
for i in range(len(s)): # iterate through indexes of string s
    if s[i] == "(":
        stack += 1
    if s[i] == ")":
        stack = max(0, stack-1)
        # this will prevent breaking nested parentheses if you have
        # ugly parenthetical text like "A) this, B) that."
    if s[i] == "|" and stack == 0:
        tokens.append(s[last_start:i])
        last_start = i+1

那就是说，如果您的括号竖条的单个案例之前有一个重击（如您的示例中所示），您可以这样做：

re.split(r"(?<!\\)\|", s)

Answer 3

您不想追加到现有的空列表，只想创建一个列表。所以：

data = metricName.split('|')

Answer 4

Delimeter是|但不是这个\ |

根据你所说的，你想要一个负面的背后断言。

试试这个：

import re
metricName = '(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage)|(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount)|(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used)|(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count)|(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\))|(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)'
data = re.split(r"(?<!\\)\|", metricName)

返回

[(WebSpherePMI\|jvmRuntimeModule:ProcessCpuUsage),
(WebSpherePMI\|threadPoolModule\|WebContainer:ActiveCount),
(GC Monitor\|Memory Pools\|Java heap:Percentage of Maximum Capacity Currently Used),
(GC Monitor\|Garbage Collectors\|(.*):GC Invocations Per Interval Count),
(GC Monitor\|Garbage Collectors\|(.*):GC Time Per Interval \(ms\)),
(GC Monitor:Percentage of Time Spent in GC during last 15 minutes)]

这里有更多关于python中的正则表达式函数，特别是负面的lookbehind断言：

(?<!...)

https://docs.python.org/2/library/re.html

如果确实你只想要|当它介于）和（然后上面的答案是最好的。

你如何在python中将文本放入数组中

4 个答案: