我有一个逐行浏览文件的Java程序,并尝试将每一行与四个正则表达式中的一个匹配。根据匹配的表达式,程序执行特定操作。这就是我所拥有的:
private void processFile(ArrayList<String> lines) {
ArrayList<Component> Components = new ArrayList<>();
Pattern pattern = Pattern.compile(
"Object name\\.{7}: (.++)|"
+ "\\{CAT=([^\\}]++)\\}|"
+ "\\{CODE=([^\\}]++)\\}|"
+ "\\{DESC=([^\\}]++)\\}");
Matcher matcher;
// Go through each line and see if the line matches the any of the regexes
// defined
Component currentComponent = null;
for (String line : lines) {
matcher = pattern.matcher(line);
if (matcher.find()) {
// We found a tag. Find out which one
String match = matcher.group();
if (match.startsWith("Obj")) {
// We've got the object name
if (currentComponent != null) {
Components.add(currentComponent);
}
currentComponent = new Component();
currentComponent.setName(matcher.group(1));
} else if (currentComponent != null) {
if (match.startsWith("{CAT")) {
currentComponent.setCategory(matcher.group(2));
} else if (match.startsWith("{CODE")) {
currentComponent.setOrderCode(matcher.group(3));
} else if (match.startsWith("{DESC")) {
currentComponent.setDescription(matcher.group(4));
}
}
}
}
if (currentComponent != null) {
Components.add(currentComponent);
}
}
正如您所看到的,我将四个正则表达式合并为一个并将整个正则表达式应用于该行。如果找到匹配项,我会检查字符串的开头以确定匹配的表达式,然后从组中提取数据。如果有人对运行代码感兴趣,下面将介绍一些示例数据:
Object name.......: PMF3800SN
Last modified.....: Wednesday 9 November 2011 11:55:04 AM
File offset (hex).: 00140598 (Hex).
Checksum (hex)....: C1C0 (Hex).
Size (bytes)......: 1,736
Properties........: {*DEVICE}
{PREFIX=Q}
{*PROPDEFS}
{PACKAGE="PCB Package",PACKAGE,1,SOT-323 MOSFET}
{*INDEX}
{CAT=Transistors}
{SUBCAT=MOSFET}
{MFR=NXP}
{DESC=N-channel TrenchMOS standard level FET with ESD protection}
{CODE=1894711}
{*COMPONENT}
{PACKAGE=SOT-323 MOSFET}
*PINOUT SOT-323 MOSFET
{ELEMENTS=1}
{PIN "D" = D}
{PIN "G" = G}
{PIN "S" = S}
虽然我的代码有效,但我不喜欢稍后在调用startsWith例程时重复部分字符串这一事实。
我很想知道别人会怎么写这个。
阿姆鲁
答案 0 :(得分:3)
group()
会返回null
。因此,您可以将子表达式分组并在匹配后检查它们null
:
Pattern pattern = Pattern.compile(
"(Object name\\.{7}: (.++))|"
+ "(\\{CAT=([^\\}]++)\\})|"
+ "(\\{CODE=([^\\}]++)\\})|"
+ "(\\{DESC=([^\\}]++)\\})");
...
if (match.group(1) != null) { // Object ...
...
} ...
实际上,如果您的子表达式中没有|
,您甚至可以使用现有的组进行此操作。
答案 1 :(得分:2)
正如@axtavt所指出的,你可以直接发现一个小组是否参加了比赛。你甚至不需要改变正则表达式;你已经为每个替代品都有一个捕获组。我喜欢使用start(n)
方法进行测试,因为它似乎更整洁,但检查group(n)
的空值(如@axtavt所做的那样)会产生相同的结果。这是一个例子:
private static void processFile(ArrayList<String> lines) {
Pattern p = Pattern.compile(
"Object name\\.{7}: (.++)|"
+ "\\{CAT=([^\\}]++)\\}|"
+ "\\{CODE=([^\\}]++)\\}|"
+ "\\{DESC=([^\\}]++)\\}");
// Create the Matcher now and reassign it to each line as we go.
Matcher m = p.matcher("");
for (String line : lines) {
if (m.reset(line).find()) {
// If group #n participated in the match, start(n) will be non-negative.
if (m.start(1) != -1) {
System.out.printf("%ncreating new component...%n");
System.out.printf(" name: %s%n", m.group(1));
} else if (m.start(2) != -1) {
System.out.printf(" category: %s%n", m.group(2));
} else if (m.start(3) != -1) {
System.out.printf(" order code: %s%n", m.group(3));
} else if (m.start(4) != -1) {
System.out.printf(" description: %s%n", m.group(4));
}
}
}
}
但是,我不确定我同意你在代码中重复部分字符串的理由。如果数据格式发生更改,或者您更改了提取的字段,则在更新代码时似乎更容易失去同步。换句话说,您当前的代码不是多余的,它是自我记录的。 :d
编辑:您在评论中提到了一次处理整个文件而不是逐行处理的可能性。这实际上是更简单的方法:
private static void processFile(String contents) {
Pattern p = Pattern.compile(
"Object name\\.{7}: (.++)|"
+ "\\{CAT=([^\\}]++)\\}|"
+ "\\{CODE=([^\\}]++)\\}|"
+ "\\{DESC=([^\\}]++)\\}");
Matcher m = p.matcher(contents);
while (m.find()) {
if (m.start(1) != -1) {
System.out.printf("%ncreating new component...%n");
System.out.printf(" name: %s%n", m.group(1));
} else if (m.start(2) != -1) {
System.out.printf(" category: %s%n", m.group(2));
} else if (m.start(3) != -1) {
System.out.printf(" order code: %s%n", m.group(3));
} else if (m.start(4) != -1) {
System.out.printf(" description: %s%n", m.group(4));
}
}
}
答案 2 :(得分:0)
我定义了一个元素,它是一个模式+一个可运行的元素。循环遍历线,然后遍历元对象。如果匹配,执行runnable。像,
class Meta {
Pattern pattern;
Runnable runnable;
Matcher matcher;
Meta(Pattern p, Runnable r) {
pattern = p;
runnable = r;
}
}
Meta[] metas = new Meta[] { new Meta(Pattern.compile(...), new Runnable() { ... }), new Meta(...), ... };
for (String line : lines) {
for (Meta meta : metas) {
final Matcher matcher = meta.pattern.matcher(line);
if (matcher.matches()) {
meta.matcher = matcher;
meta.runnable.run();
}
}
}
这是“对象”行的Meta对象的样子,
Meta m = new Meta(Pattern.compile("Object name\\.{7}: (.++)", new Runnable() {
// We've got the object name
if (currentComponent != null) {
Components.add(currentComponent);
}
currentComponent = new Component();
currentComponent.setName(matcher.group(1));
});