如何管道/链接正则表达式

时间:2016-02-14 21:56:37

标签: java regex pattern-matching

我必须从String文本中提取度量单位和数量。就像在这个样本中一样:

Original String // result:

abc1mgabc // extract 1 and mg separately
abc100mlabc //100 and ml
abc256kgabc //256 and kg

到目前为止,我在第一时间使用这个正则表达式:

(?i)\d{1,5}(mg|g|gr|kg|ml|l)

提取数量和单位并存储到quant_unit字符串。

之后,我将quant_unit这两个正则表达式\d(?i)(mg|g|gr|kg|ml|l)分别用于提取"数量"和"衡量单位"。

但我认为它必须有一种方法可以通过在原始字符串上仅应用正则表达式(对于要提取的每个项目)来单独提取它吗?

我虽然使用了某种类似的东西:

original_string - > applyRegex(提取度量和单位) - > applyRegex2(从中提取度量或单位)。

使用自己的正则表达式或Java中的Pattern类。

我创建了一个枚举,以便轻松访问模式:

public enum Patterns {

    //the expression is: (?i)\d{1,5}(mg|g|gr|kg|ml|l)
    QUANTITY_UNIT("(?i)\\d{1,5}(" + MeasureUnit.getRegex() + ")"),
    QUANTITY("\\d"),
    UNIT("(?i)(" + MeasureUnit.getRegex() + ")");

    private Pattern pattern;

    Patterns(String patternString) {
        System.out.println(patternString);
        pattern = pattern.compile(patternString);
    }

    public Pattern getPattern() {
        return pattern;
    }

    public Matcher getMatcher(CharSequence input) {
        return getPattern().matcher(input);
    }

    public String findGroup(CharSequence input) {
        Matcher matcher = getMatcher(input);
        matcher.find();
        return matcher.group();

    }

单元测试所需的行为:

public class PatternsTest {

@Test
public void quantityUnit() {
    String testString = "abc1kgabc1l";
    String fg = Patterns.QUANTITY_UNIT.findGroup(testString);
    Assert.assertEquals("1KG", fg);
}


@Test
public void quantity() {
    String testString = "abc1kgabc1l";
    String fg = Patterns.QUANTITY.findGroup(testString);
    Assert.assertEquals("1", fg);
}


@Test
public void unity() {
    String testString = "abc1kgabc1l";
    String fg = Patterns.UNIT.findGroup(testString);
    Assert.assertEquals("kg", fg);
}

}

修改

我根据评论e答案进行了一些重构,现在它运行正常:

public enum Patterns {

    QUANTITY_UNIT("(?i)([0-9]+)(" + MeasureUnit.getRegex() + ")");

    private Pattern pattern;

    Patterns(String patternString) {
        pattern = pattern.compile(patternString);
    }

    public Pattern getPattern() {
        return pattern;
    }

    public Matcher getMatcher(CharSequence input) {
        return getPattern().matcher(input);
    }

    public String getQuantity(CharSequence input) {
        final int group_idx = 1;

        Matcher matcher = getMatcher(input);
        boolean found = matcher.find();
        return found ? toLower(matcher.group(group_idx)) : "";
    }

    private String toLower(String input) {
        return input.toLowerCase();
    }

    public String getUnity(CharSequence input) {
        final int group_idx = 2;

        Matcher matcher = getMatcher(input);
        boolean found = matcher.find();
        return found ? toLower(matcher.group(group_idx)) : "";
    }
}

测试:

public class MeasureUnityTest {


    @Test
    public void quantity() {
        String testString = "abc1kgabc1l";
        String fg = QUANTITY_UNIT.getQuantity(testString);
        Assert.assertEquals("1", fg);
    }


    @Test
    public void unity() {
        String testString = "abc1kgabc1l";
        String fg = QUANTITY_UNIT.getUnity(testString);
        Assert.assertEquals("kg", fg);
    }

    @Test
    public void unityUpperCase() {
        String testString = "abc1KGabc1l";
        String fg = QUANTITY_UNIT.getUnity(testString);
        Assert.assertEquals("kg", fg);
    }


    @Test
    public void unityNoOccurrence() {
        String testString = "fasfasfasfaf";
        String fg = QUANTITY_UNIT.getQuantity(testString);
        Assert.assertEquals("", fg);
    }

    @Test
    public void unityEmptyString() {
        String testString = "";
        String fg = QUANTITY_UNIT.getQuantity(testString);
        Assert.assertEquals("", fg);
    }

    /* If more than one matches, return the first*/
    @Test
    public void unityMoreThanOne() {
        String testString = "abc5mlabc5kg";
        String fg = QUANTITY_UNIT.getUnity(testString);
        Assert.assertEquals("ml", fg);
    }

    /* If more than one matches, return the first*/
    @Test
    public void quantityMoreThanOne() {
        String testString = "abcm5mlabc1kg";
        String fg = QUANTITY_UNIT.getQuantity(testString);
        Assert.assertEquals("5", fg);
    }

}

1 个答案:

答案 0 :(得分:2)

概括所有评论,您可以使用与此类似的内容(IDEONE link):

String[] tests = { "abc1mgabc","abc100mlabc","abc256kgabc"};
Pattern ptrn = Pattern.compile("(?i)([0-9]+)(gr|kg|mg|ml|g|l)");
for (String s: tests) {
    Matcher matcher = ptrn.matcher(s);
    while (matcher.find()) {
        System.out.println("QNTY: " + matcher.group(1));
        System.out.println("UNIT:" + matcher.group(2));
    }
}

输出:

QNTY: 1
UNIT:mg
QNTY: 100
UNIT:ml
QNTY: 256
UNIT:kg

请参阅IDEONE demo

要点:

  • 您可以使用2个捕获组并使用1个正则表达式捕获这两个实体。
  • 确保最长的替代方案在交替中排在第一位,因为在Java正则表达式中选择了第一个替代方案(不符合选择最长的替换方案的POSIX标准)。
  • 如果你不确定这些数字有多大,那么使用+量词(一次或多次出现)而不是只能匹配1到5次出现的限制{1,5}就足够了