我正在尝试从输入数据中提取字符串,如下所示:
I love [[cricket]]. Let's play it at [[16:00]].
我希望输出为:cricket, 16:00
我尝试了多个正则表达式,例如:
'(?<=\[\[).*?(?=\]\])',
'\\[\\[(.*?)\\]\\]',
'\[\[(.*?)\]\]',
'[[([^>]*?)]]'.
grunt> Register '/usr/local/pig/lib/piggybank.jar';
grunt> Define Xpath org.apache.pig.piggybank.evaluation.xml.XPath();
grunt> page = Load 'hdfs://master:9000/IP/Wikipedia-20181215070630.xml' using org.apache.pig.piggybank.storage.XMLLoader('page') as (x : chararray);
link = Foreach page Generate Flatten(REGEX_EXTRACT_ALL(x, '(?<=\[\[).*?(?=\]\])'));
每当我的正则表达式为'['时,猪就会在错误消息下方抛出:
<line 1, column 64> Unexpected character '['
2018-12-22 04:41:24,535 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 1, column 64> Unexpected character '['.
当我尝试上述其他正则表达式时,得到空白输出。