Question

我在字段中有一个字符串＆＃39; product＆＃39;采用以下形式：

 ";TT_RAV;44;22;"

我想首先拆分＆＃39 ;;＆＃39;然后拆分＆＃39; _＆＃39;所以返回的是

  "RAV"

我知道我可以这样做：

    parse_1 =  foreach { 
    splitup = STRSPLIT(product,';',3); 
    generate splitup.$1 as depiction; 
    };

这将返回字符串＆＃39; TT_RAV＆＃39;然后我可以做另一次分裂，并预测出“RAV＆＃39;然而，这似乎是通过多个Map作业传递数据 - 是否可以在一次传递中解析出所需的字段？

此示例不起作用，因为内部splitstring返回元组，但显示逻辑：

     c parse_1 =  foreach { 
    splitup = STRSPLIT(STRSPLIT(product,';',3),'_',1); 
    generate splitup.$1 as depiction; 
    };

在没有多个地图阶段的纯明胶中是否可以这样做？

Answer 1

请勿使用STRSPLIT。您正在寻找REGEX_EXTRACT：

REGEX_EXTRACT(product, '_([^;]*);', 1) AS depiction

如果能够精确地选出第二个以分号分隔的字段，然后选择第二个以下划线分隔的子字段，则可以使正则表达式更复杂：

REGEX_EXTRACT(product, '^[^;]*;[^_;]*_([^_;]*)', 1) AS depiction

以下是该正则表达式如何运作的细分：

^      // Start at the beginning
[^;]*  // Match as many non-semicolons as possible, if any (first field)
;      // Match the semicolon; now we'll start the second field
[^_;]* // Match any characters in the first subfield
_      // Match the underscore; now we'll start the second subfield (what we want)
(      // Start capturing!
[^_;]* // Match any characters in the second subfield
)      // End capturing

Answer 2

唯一有多个地图的时间是你有一个触发减少的运算符（JOIN，GROUP等等）。如果您对脚本运行说明，则可以查看是否存在多个还原阶段。

猪嵌套STRSPLIT

2 个答案: