我写了一个正则表达式:
value='[A-Za-z]+\\,[0-9]+\\,([A-Za-z0-9]+)\\,([A-Za-z0-9]+)'>[A-Za-z0-9]+\\s-\\s(.*)?\\s\\(
它运作得相当不错,但问题是它的最后一直保持匹配...
例如,它应该在书籍上工作,我正在测试它如下:
value='C,201301,F110,JEWL1050'>JEWL1050 - Industry Skills I (F110)</option>
value='C,201301,F114,JEWL1050'>JEWL1050 - Industry Skills I (F114)</option>
value='C,201301,F114,JEWL1054'>JEWL1054 - Jewellery Rendering & Illustra (F114)</option>
value='C,201301,F110,JEWL2029'>JEWL2029 - Production Techniques B (F110)</option>
value='C,201301,F114,JEWL2029'>JEWL2029 - Production Techniques B (F114)</option>
value='C,201301,LIAD,LANG9066'>LANG9066 - Italian For Beginners (LIAD)</option>
value='C,201301,T302,LAW1151'>LAW1151 - Canandian & Environmental Law (T302)</option>
value='C,201301,T305,LAW1151'>LAW1151 - Canandian & Environmental Law (T305)</option>
value='C,201301,F402,LAW1152'>LAW1152 - International Law & Agreements (F402)</option>
value='C,201301,T302,LAW3201'>LAW3201 - Protection Legislation (T302)</option>
value='C,201301,T303,LAW3201'>LAW3201 - Protection Legislation (T303)</option>
value='C,201301,T304,LAW3201'>LAW3201 - Protection Legislation (T304)</option>
因此,对于第一本书,它应该将F110
视为第1组,将JEWL1050
视为第2组,将Industry Skills I
视为第3组。
然而,它正确捕获前两组但不捕获最后一组。它会捕获- Industry Skills I (F110)</option>
而不是..
我有什么想法可以修复我的正则表达式?我似乎无法完成最后一组。 请帮我。先谢谢你。
答案 0 :(得分:1)
从理论上讲,这应该按原样运作。
以下是您提出的正则表达式(由于工具与Java代码的性质,\\
已更改为\
)应用于您的示例输入时:http://regex101.com/r/hL8pZ8
这个工具也提供了一个“Java”复选框,甚至是相应的Java代码,虽然没有永久链接所以你必须输入正则表达式(再次使用\\
而不是\
)并自己提供样本数据:http://www.myregextester.com/index.php
那就是说,对于子孙后代,这是它的输出:
Raw Match Pattern:
value='[A-Za-z]+\,[0-9]+\,([A-Za-z0-9]+)\,([A-Za-z0-9]+)'>[A-Za-z0-9]+\s-\s(.*)?\s\(
Java Code Example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
public static void main(String[] asd){
String sourcestring = "source string to match with pattern";
Pattern re = Pattern.compile("value='[A-Za-z]+\\,[0-9]+\\,([A-Za-z0-9]+)\\,([A-Za-z0-9]+)'>[A-Za-z0-9]+\\s-\\s(.*)?\\s\\(");
Matcher m = re.matcher(sourcestring);
int mIdx = 0;
while (m.find()){
for (int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++){
System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
}
mIdx++;
}
}
}
$matches Array:
(
[0] => Array
(
[0] => value='C,201301,F110,JEWL1050'>JEWL1050 - Industry Skills I (
[1] => value='C,201301,F114,JEWL1050'>JEWL1050 - Industry Skills I (
[2] => value='C,201301,F114,JEWL1054'>JEWL1054 - Jewellery Rendering & Illustra (
[3] => value='C,201301,F110,JEWL2029'>JEWL2029 - Production Techniques B (
[4] => value='C,201301,F114,JEWL2029'>JEWL2029 - Production Techniques B (
[5] => value='C,201301,LIAD,LANG9066'>LANG9066 - Italian For Beginners (
[6] => value='C,201301,T302,LAW1151'>LAW1151 - Canandian & Environmental Law (
[7] => value='C,201301,T305,LAW1151'>LAW1151 - Canandian & Environmental Law (
[8] => value='C,201301,F402,LAW1152'>LAW1152 - International Law & Agreements (
[9] => value='C,201301,T302,LAW3201'>LAW3201 - Protection Legislation (
[10] => value='C,201301,T303,LAW3201'>LAW3201 - Protection Legislation (
[11] => value='C,201301,T304,LAW3201'>LAW3201 - Protection Legislation (
)
[1] => Array
(
[0] => F110
[1] => F114
[2] => F114
[3] => F110
[4] => F114
[5] => LIAD
[6] => T302
[7] => T305
[8] => F402
[9] => T302
[10] => T303
[11] => T304
)
[2] => Array
(
[0] => JEWL1050
[1] => JEWL1050
[2] => JEWL1054
[3] => JEWL2029
[4] => JEWL2029
[5] => LANG9066
[6] => LAW1151
[7] => LAW1151
[8] => LAW1152
[9] => LAW3201
[10] => LAW3201
[11] => LAW3201
)
[3] => Array
(
[0] => Industry Skills I
[1] => Industry Skills I
[2] => Jewellery Rendering & Illustra
[3] => Production Techniques B
[4] => Production Techniques B
[5] => Italian For Beginners
[6] => Canandian & Environmental Law
[7] => Canandian & Environmental Law
[8] => International Law & Agreements
[9] => Protection Legislation
[10] => Protection Legislation
[11] => Protection Legislation
)
)
答案 1 :(得分:1)
答案 2 :(得分:1)
我已经检查过不需要C,201301
。因此,一个简单的解决方案是将<
和>
之间的值视为垃圾,仅关注>
到<
:
<option value='C,201301,T302,LAW3201'>LAW3201 - Protection Legislation (T302)</option>
<option value='C,201301,T303,LAW3201'>LAW3201 - Protection Legislation (T303)</option>
<option value='C,201301,T304,LAW3201'>LAW3201 - Protection Legislation (T304)</option>
这表明:
>([A-Z]+[0-9])+\\s-\\s(.*)?\\s([A-Z0-9]+)<
作为三个群体的充分表达。