Question

我正在尝试从解析的字符串中获取某些数据。这些字符串通常保持一定的模式（约90％的时间）例子是

2 tsp ground nutmeg  (this should parse to {quantity = 2.0}{units = tsp}{ingredient = ground nutmeg})
1 cup water (this should parse {quantity = 1.0}{units = cup}{ingredient = water}

然后我使用正则表达式或string.split（）解析字符串，通常使用空格。我以这种模式{quantity} {unit} {ingredient}提取数据。第一个用空格分隔的字符串是数量。然后，我对照所有标准度量单位检查第二个以空格分隔的字符串。然后将其余的字符串作为成分。然后，所有这些都用于填充一个称为成分的类，该类具有数量，单位和成分。

当事情不符合这种模式时，就会出现问题。

我从中解析的字符串超出了我的控制范围，仅提供给我，并且都很重要。因此，我不能丢弃不适合我的模式的字符串。除了字符串以外的所有内容都在我的控制之下，因此我可以根据需要进行更改。

想到的两个例子是

1 egg, beaten 
(not sure what i should parse this too... maybe {quantity = 1.0}{units = ""} 
{ingredient = egg, beaten}
or 
1 (.25 ounce) package active dry yeast
(not sure how i should parse it but should end up as {quantity = 0.25}{units = ounce}
{ingredient = active dry yeast}

在第一个示例中，我将2解析为数量，针对我拥有的所有单位搜索鸡蛋，确定它不适合一个单位并将其作为成分。

第二个示例只是破坏了我的代码，因为它获取数量（1）然后寻找单位（只有有效单位是盎司），然后将其余部分作为成分名称。 {数量= 1} {单位=盎司} {成分= 0.25包装活性干酵母}这是错误的。

这是解析字符串中成分的代码。

       public void setRecipeIngredientFromParsedString(String s)
{
    String ingredientName = new String();

    //gets rid of extra junk at end of string
    if(!s.trim().isEmpty())
    {
        //split by spaces into tokens
        String[] tokens = s.split(" ");

        //the quantity is allowed to be a fraction.  
        //allowed are 1 1/2 (mixed number) 1/3 (fraction) 1 (whole number)
        if (Fraction.isFraction(tokens[0])) {
            this.amount.setQuantity(Fraction.parseFraction(tokens[0]).toDecimal());
        } else if (Fraction.isNumber(tokens[0])) {
            if (Fraction.isFraction(tokens[1])) {
                this.amount.setQuantity(Fraction.parseFraction(tokens[0] + " " + tokens[1]).toDecimal());
            } else {
                this.amount.setQuantity(Double.parseDouble(tokens[0]));
            }
        }


        //this could be done better.  I look through all the tokens until i find one that is a unit.  
        // Then i set the unit and delete that token. 
        for (int i = 0; i < tokens.length; i++) {

            for (String unit : Measurement.getAllUnits()) {
                if (tokens[i].contains(unit)) {
                    this.amount.setUnit(tokens[i]);
                    tokens[i] = "";

                }
            }

        }

        //this is not quite there either.  every remaining token should be the ingredient name
        // but we go ahead and replace all numbers and slashes just in case.
        for (String token : tokens) {
            ingredientName += token + " ";
        }
        ingredientName = ingredientName.replaceAll("[0-9,/,(,)]", "");

        this.setName(ingredientName.trim());


    }
}

我想要的是一些在不完全符合正常模式的情况下如何解析正确数据的想法或建议。我不完全知道可能发生的所有可能模式，因此如果每种变体类型的语句似乎是一个糟糕的解决方案，则进行硬编码。您将如何处理这些数据并从中获取相关信息？

谢谢！让我知道我是否可以给您任何澄清。

如何从经过分析的字符串中获取数据，这些字符串不同于模式

0 个答案: