我在下面有以下文字,我想分成3组。
"ax (0) 0 Critical Issues | 0 Non-critical",
"by master1 (0) 0 Critical Issues | 0 Non-critical",
"chef (1,507) 923 Critical Issues | 0 Non-critical",
"children (0) 0 Critical Issues | 0 Non-critical",
"chris test (1) 0 Critical Issues | 0 Noncritical",
"_Regression (8) 315 Critical Issues | 0 Noncriticals"
我希望格式化文本如下所示
ax, 0, 0
by master, 0, 0
chef, 923, 0
children, 0, 0
chris test 0, 0
_Regression 315, 0
这是否可以使用正则表达式?
答案 0 :(得分:0)
根据你的文字,我认为这个正则表达式应该
([_a-zA-Z0-9 ]+) \([0-9,]+\) ([0-9,]+) Critical Issues \| ([0-9,]+) Non-?criticals?
然后,您需要使用bracket-matching从所需的每一行中提取三个值并对其进行适当格式化。
基本上你把正则表达式的一部分放在括号(
,)
之间,它将模式的那部分标记为分组,允许你识别输入字符串的那部分括号内的模式匹配。
输入字符串:
"456-abc-123",
"387-zxf-345",
"830-fft-492"
正则表达式:
[0-9]+-([a-z]+)-[0-9]+
使用这种模式,我们可以选择为每个字符串提取([a-z]+)
匹配的文本,这将给我们提供:
abc
zxf
fft
如果您能说出您想要使用的语言,我们就能够提供代码解决方案。
import re
text = ["ax (0) 0 Critical Issues | 0 Non-critical",
"by master1 (0) 0 Critical Issues | 0 Non-critical",
"chef (1,507) 923 Critical Issues | 0 Non-critical",
"children (0) 0 Critical Issues | 0 Non-critical",
"chris test (1) 0 Critical Issues | 0 Noncritical",
"_Regression (8) 315 Critical Issues | 0 Noncriticals"]
for entry in text:
m = re.search(r'([_a-zA-Z0-9 ]+) \([0-9,]+\) ([0-9,]+) Critical Issues \| ([0-9,]+) Non-?criticals?', entry)
print "%s, %s, %s" % (m.group(1), m.group(2), m.group(3))
输出:
ax, 0, 0
by master1, 0, 0
chef, 923, 0
children, 0, 0
chris test, 0, 0
_Regression, 315, 0
text = ["ax (0) 0 Critical Issues | 0 Non-critical",
"by master1 (0) 0 Critical Issues | 0 Non-critical",
"chef (1,507) 923 Critical Issues | 0 Non-critical",
"children (0) 0 Critical Issues | 0 Non-critical",
"chris test (1) 0 Critical Issues | 0 Noncritical",
"_Regression (8) 315 Critical Issues | 0 Noncriticals"]
for entry in text
m = /([_a-zA-Z0-9 ]+) \([0-9,]+\) ([0-9,]+) Critical Issues \| ([0-9,]+) Non-?criticals?/.match(entry)
printf("%s, %s, %s\n", m[1], m[2], m[3])
end
同时输出:
ax, 0, 0
by master1, 0, 0
chef, 923, 0
children, 0, 0
chris test, 0, 0
_Regression, 315, 0