Question

我正在尝试从字符串中提取n个3元组（Si，Pi，Vi）。

该字符串包含至少一个这样的3元组。 Pi和Vi不是强制性的。

SomeTextxyz@S1((property(P1)val(V1))@S2((property(P2)val(V2))@S3
           |----------1-------------|----------2-------------|-- n

所需的输出是：

Si,Pi,Vi.

因此，对于字符串中的n次出现，输出应如下所示：

[S1,P1,V1] [S2,P2,V2] ... [Sn-1,Pn-1,Vn-1] (without the brackets)

示例

输入字符串可能是这样的：

MyCarGarage@Mustang((property(PS)val(500))@Porsche((property(PS)val(425‌)).

处理后，输出应为：

Mustang,PS,500 Porsche,PS,425

是否有一种使用正则表达式提取这些3元组的有效方法（例如使用C ++和std::regex）它会是什么样的？

Answer 1

@(.*?)\(\(property\((.*?)\)val\((.*?)\)\)应该可以解决问题。

示例http://regex101.com/r/bD1rY2

@                # Matches the @ symbol
(.*?)            # Captures everything until it encounters the next part (ungreedy wildcard)
\(\(property\(   # Matches the string "((property(" the backslashes escape the parenthesis
(.*?)            # Same as the one above
\)val\(          # Matches the string ")val(" 
(.*?)            # Same as the one above
\)\)             # Matches the string "))"

你应该如何用C ++实现这个我不知道，但这很容易：）

Answer 2

http://ideone.com/S7UQpA

我使用C <regex.h>代替std::regex，因为std::regex未在g ++中实现（这是IDEONE使用的）。我使用的正则表达式：

"                        In C(++)? regexes are strings.
  @                      Literal match
  ([^(@]+)               As many non-@, non-( characters as possible.  This is group 1
  (                      Start another group (group 2)
    \\(\\(property\\(    Yet more literal matching
    ([^)]+)              As many non-) characters as possible.  Group 3.
    \\)val\\(            Literal again
    ([^)]+)              As many non-) characters as possible.  Group 4.
    \\)\\)               Literal parentheses
  )                      Close group 2
  ?                      Group 2 optional
"                        Close Regex

还有一些c ++：

int getMatches(char* haystack, item** items){

首先，计算字符串的长度（稍后我们将使用它）和字符串中找到的@的数量（最大匹配数）

    int l = -1, ats = 0;
    while (haystack[++l])
        if (haystack[l] == '@')
            ats++;

malloc足够大的数组。

    *items = (item*) malloc(ats * sizeof(item));
    item* arr = *items;

找一个正则表达式针。 REGEX在其他地方是#define。

    regex_t needle;
    regcomp(&needle, REGEX, REG_ICASE|REG_EXTENDED);
    regmatch_t match[5];

ret将保留返回值（“找到匹配”为0，但您可能希望在此处捕获其他错误）。 x将用于计算找到的匹配项。

    int ret;
    int x = -1;

循环匹配（如果找到匹配，则ret将为零）。

    while (!(ret = regexec(&needle, haystack, 5, match,0))){
        ++x;

从匹配1

获取名称

        int bufsize = match[1].rm_eo-match[1].rm_so + 1;
        arr[x].name = (char *) malloc(bufsize);
        strncpy(arr[x].name, &(haystack[match[1].rm_so]), bufsize - 1);
        arr[x].name[bufsize-1]=0x0;

检查以确保找到属性（匹配[3]）和值（匹配[4]）。

        if (!(match[3].rm_so > l || match[3].rm_so<0 || match[3].rm_eo > l || match[3].rm_so< 0
                || match[4].rm_so > l || match[4].rm_so<0 || match[4].rm_eo > l || match[4].rm_so< 0)){

从匹配[3]获取属性。

            bufsize = match[3].rm_eo-match[3].rm_so + 1;
            arr[x].property = (char *) malloc(bufsize);
            strncpy(arr[x].property, &(haystack[match[3].rm_so]), bufsize - 1);
            arr[x].property[bufsize-1]=0x0;

从匹配[4]中获取值。

            bufsize = match[4].rm_eo-match[4].rm_so + 1;
            arr[x].value = (char *) malloc(bufsize);\
            strncpy(arr[x].value, &(haystack[match[4].rm_so]), bufsize - 1);
            arr[x].value[bufsize-1]=0x0;
        } else {

否则，将property和value都设置为NULL。

            arr[x].property = NULL;
            arr[x].value = NULL;
        }

将干草堆移动到匹配位置并减少已知长度。

        haystack = &(haystack[match[0].rm_eo]);
        l -= match[0].rm_eo;
    }

返回匹配数。

    return x+1;
}

希望这会有所帮助。虽然现在我发现你从未回答过一个至关重要的问题：What have you tried?

如何使用正则表达式从字符串中提取3元组值

2 个答案: