我有一个包含以下数据的CSV文件:
396124436476092000,Think about the life you livin but don't think so hard it hurts Life is truly a gift, but at the same it is a curse,Obey_Jony09
396124436740317184,"“@BleacherReport: Halloween has given us this amazing Derrick Rose photo (via @amandakaschube, @ScottStrazzante) http://t.co/tM0wEugZR1” yes",Colten_stamkos
我在PigLatin中编写了以下代码,使用REGEX_EXTRACT_ALL中的分隔符将数据输入到别名B.此命令输出由(。*)
表示的所有数据A = load '/user/pig/tweets' as (line);
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'(.*)[,”:-](.*)[“,:-](.*)')) AS (tweetid:long,msg:chararray,userid:chararray);
所以想知道正则表达式函数如何与表达式
一起使用'(.*)[,”:-](.*)[“,:-](.*)'
将数据拆分为架构(tweetid,msg,userid)