Question

我们正在尝试为REST服务构建资源扩展功能。资源扩展可以通过以下模式提供


    fields=field1,field2(sf1,sf2),field3[Format],field4(sf1,sf2,sf3)

解析这个问题的最佳方法是什么？解析必须针对每个传入请求进行，因此必须更好地执行。

我们正在尝试检查是否可以为此定义正则表达式。什么可能是这种模式的正则表达式？

编辑（2014年3月10日）：该字符串仅包含元数据（Java字段名称），它也可以是多层次的

field1(sf1,sf2(sf21,sf22)),field2[Format],field3[Format],field4(sf1,sf2,sf3)

我应该手动使用正则表达式还是解析？

Answer 1

正则表达式不支持嵌套/平衡语法。例如，解析数学语句并确保每个左括号都具有适当平衡的右括号，或者解析XML或HTML以确保每个元素都正确关闭，需要更具表现力的语法。（有关学术解释，请参阅Chomsky's Hierarchy，特别注意常规和无上下文语言之间的区别。）

为了使用嵌套语法解析语言，你需要相当于“下推自动机”（PDA），但不要担心 - 所有这些花哨的术语实际上都是微不足道的。您可以使用递归或循环来解决问题，在每次迭代中使用正则表达式，或者只是构建自己的解析方法。

我最近在Rest API中实现了完全相同的功能，虽然我的语法略有不同，但我怀疑您可能会发现此代码有用：

/**
 * Given a single packed string that defines a recursive set of fields,
 * this will parse and return a Map of terms from the root level where the
 * term is mapped to the packed string defining the sub-fields within that key.
 *  
 * Assume the primary/root result is a Movie...
 *              --(raw==null) get all movie First Order (FO) attributes
 * stars        --get all movie FO, and expand stars relation
 * title        --get movies FO id and title
 * title,stars  --get movies FO id and title, and expand stars relation
 * 
 * stars{}      --get all movie FO, and expand stars relation (same as stars)
 * stars{name}  --get all movie FO, and expand stars relation getting star FO id and name
 * stars{contractStudio}    --get all movie FO, expand stars relation getting all star FO and expand stars contract studio
 * stars{name,contractStudio}   --get all movie FO, and expand stars relation getting star FO id and name and expand stars contract studio
 * title,stars{name,contractStudio{name,founded}}   --get movies FO id and title, and expand stars relation getting star FO id and name and expand stars contract studio with the studio FO name and founded date
 */
private Map<String, String> parseRequestParameter(String raw) {
    if (raw == null || raw.isEmpty()) return Collections.emptyMap();
    Map<String, String> results = new HashMap<>();
    int i = 0;
    int j = 0;
    while (j < raw.length()) {
        char c = raw.charAt(j);
        //move j to end of attr name
        while (c != '{' && c != ',' && ++j < raw.length()) {c = raw.charAt(j);}
        String attr = raw.substring(i, i = j).trim();
        if (!attr.isEmpty()) {
            //capture the optional sub-expansion
            if (c == '{') {
                i++;  //move i past the opening '{'
                int pDepth = 1;
                while (pDepth > 0 && ++j < raw.length()) {  //pDepth is depth of nested { }
                    pDepth += (c = raw.charAt(j)) == '{' ? 1 : (c == '}' ? -1 : 0);
                }
                results.put(attr, raw.substring(i, j).trim());
                if (++j < raw.length()) c = raw.charAt(i = j);  //move i and c past the closing '}'
            }
            else {
                results.put(attr, null);
            }
        }
        //skip any unexpected suffix trash... only ',' marks next term.
        while ((i = ++j) < raw.length() && c != ',') {c = raw.charAt(j);}
    }
    return results;
}

在我们的例子中，正如您可能从javadoc推断的那样，如果没有指定扩展字符串，我们将返回结果的所有“一阶”（FO）属性。如果命名了特定属性，则它们是扩展术语（如果它们命名可以展开的关系属性）或者它们是缩小术语（如果它们命名为FO属性。）如果指定了任何缩小术语，则渲染结果包含只有要求的条款。此外，无论请求的条件是什么，我们都会返回id。

上述方法仅解析包含扩展规范的原始值。它生成一个Map，其中键是扩展规范顶层的单个术语。这些值是扩展时需要应用于该术语的扩展规范（剩余打包）。这是回归发挥作用的地方。显然，这种情况发生在比这种方法更高的水平，我认为你可以从这里到达那里。

这种方法非常强大。它假定原始值可能包含不平衡的花括号和垃圾字符。当遇到这些时，它将忽略它们，并尽可能地从原始值中挽救它们。这是一种“失败的最后”方法。

Answer 2

在不知道每个字段的数据值的情况下，尝试找出特定的RegEx可能非常困难。

为什么不使用POST或PUT并将数据值放在邮件正文中？这样，您可以使用JSON来组织数据。（好的...... XML或YAML也可以 - 但我喜欢JSON）。

解析REST服务的资源扩展

2 个答案: