我必须从String文本中提取度量单位和数量。就像在这个样本中一样:
Original String // result:
abc1mgabc // extract 1 and mg separately
abc100mlabc //100 and ml
abc256kgabc //256 and kg
到目前为止,我在第一时间使用这个正则表达式:
(?i)\d{1,5}(mg|g|gr|kg|ml|l)
提取数量和单位并存储到quant_unit
字符串。
之后,我将quant_unit
这两个正则表达式\d
和(?i)(mg|g|gr|kg|ml|l)
分别用于提取"数量"和"衡量单位"。
但我认为它必须有一种方法可以通过在原始字符串上仅应用正则表达式(对于要提取的每个项目)来单独提取它吗?
我虽然使用了某种类似的东西:
original_string - > applyRegex(提取度量和单位) - > applyRegex2(从中提取度量或单位)。
使用自己的正则表达式或Java中的Pattern类。
我创建了一个枚举,以便轻松访问模式:
public enum Patterns {
//the expression is: (?i)\d{1,5}(mg|g|gr|kg|ml|l)
QUANTITY_UNIT("(?i)\\d{1,5}(" + MeasureUnit.getRegex() + ")"),
QUANTITY("\\d"),
UNIT("(?i)(" + MeasureUnit.getRegex() + ")");
private Pattern pattern;
Patterns(String patternString) {
System.out.println(patternString);
pattern = pattern.compile(patternString);
}
public Pattern getPattern() {
return pattern;
}
public Matcher getMatcher(CharSequence input) {
return getPattern().matcher(input);
}
public String findGroup(CharSequence input) {
Matcher matcher = getMatcher(input);
matcher.find();
return matcher.group();
}
单元测试所需的行为:
public class PatternsTest {
@Test
public void quantityUnit() {
String testString = "abc1kgabc1l";
String fg = Patterns.QUANTITY_UNIT.findGroup(testString);
Assert.assertEquals("1KG", fg);
}
@Test
public void quantity() {
String testString = "abc1kgabc1l";
String fg = Patterns.QUANTITY.findGroup(testString);
Assert.assertEquals("1", fg);
}
@Test
public void unity() {
String testString = "abc1kgabc1l";
String fg = Patterns.UNIT.findGroup(testString);
Assert.assertEquals("kg", fg);
}
}
我根据评论e答案进行了一些重构,现在它运行正常:
public enum Patterns {
QUANTITY_UNIT("(?i)([0-9]+)(" + MeasureUnit.getRegex() + ")");
private Pattern pattern;
Patterns(String patternString) {
pattern = pattern.compile(patternString);
}
public Pattern getPattern() {
return pattern;
}
public Matcher getMatcher(CharSequence input) {
return getPattern().matcher(input);
}
public String getQuantity(CharSequence input) {
final int group_idx = 1;
Matcher matcher = getMatcher(input);
boolean found = matcher.find();
return found ? toLower(matcher.group(group_idx)) : "";
}
private String toLower(String input) {
return input.toLowerCase();
}
public String getUnity(CharSequence input) {
final int group_idx = 2;
Matcher matcher = getMatcher(input);
boolean found = matcher.find();
return found ? toLower(matcher.group(group_idx)) : "";
}
}
public class MeasureUnityTest {
@Test
public void quantity() {
String testString = "abc1kgabc1l";
String fg = QUANTITY_UNIT.getQuantity(testString);
Assert.assertEquals("1", fg);
}
@Test
public void unity() {
String testString = "abc1kgabc1l";
String fg = QUANTITY_UNIT.getUnity(testString);
Assert.assertEquals("kg", fg);
}
@Test
public void unityUpperCase() {
String testString = "abc1KGabc1l";
String fg = QUANTITY_UNIT.getUnity(testString);
Assert.assertEquals("kg", fg);
}
@Test
public void unityNoOccurrence() {
String testString = "fasfasfasfaf";
String fg = QUANTITY_UNIT.getQuantity(testString);
Assert.assertEquals("", fg);
}
@Test
public void unityEmptyString() {
String testString = "";
String fg = QUANTITY_UNIT.getQuantity(testString);
Assert.assertEquals("", fg);
}
/* If more than one matches, return the first*/
@Test
public void unityMoreThanOne() {
String testString = "abc5mlabc5kg";
String fg = QUANTITY_UNIT.getUnity(testString);
Assert.assertEquals("ml", fg);
}
/* If more than one matches, return the first*/
@Test
public void quantityMoreThanOne() {
String testString = "abcm5mlabc1kg";
String fg = QUANTITY_UNIT.getQuantity(testString);
Assert.assertEquals("5", fg);
}
}
答案 0 :(得分:2)
概括所有评论,您可以使用与此类似的内容(IDEONE link):
String[] tests = { "abc1mgabc","abc100mlabc","abc256kgabc"};
Pattern ptrn = Pattern.compile("(?i)([0-9]+)(gr|kg|mg|ml|g|l)");
for (String s: tests) {
Matcher matcher = ptrn.matcher(s);
while (matcher.find()) {
System.out.println("QNTY: " + matcher.group(1));
System.out.println("UNIT:" + matcher.group(2));
}
}
输出:
QNTY: 1
UNIT:mg
QNTY: 100
UNIT:ml
QNTY: 256
UNIT:kg
请参阅IDEONE demo
要点:
+
量词(一次或多次出现)而不是只能匹配1到5次出现的限制{1,5}
就足够了