执行PIg脚本时出错

时间:2018-02-27 05:46:32

标签: hdfs apache-pig

我正在尝试执行放在HDFS中的猪脚本。我收到了一个错误。

猪堆痕迹

ERROR 2999: Unexpected internal error. null

java.lang.NullPointerException
    at org.apache.pig.impl.io.FileLocalizer.fetchFilesInternal(FileLocalizer.java:734)
    at org.apache.pig.impl.io.FileLocalizer.fetchFiles(FileLocalizer.java:699)
    at org.apache.pig.PigServer.registerJar(PigServer.java:522)
    at org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:473)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:546)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:613)
    at org.apache.pig.Main.main(Main.java:158)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

但是如果我在grunt shell中按语句执行相同的脚本语句,那么它可以正常工作。以下命令用于执行脚本。

pig -x mapreduce hdfs://quickstart.cloudera:8020/user/cloudera/oozie/PigScript/tweetAnalysis.pig 

这是脚本

--REGISTER required Json Jars for parsing Json

REGISTER '/usr/lib/pig/lib/json-simple-1.1.jar'
REGISTER '/home/cloudera/Desktop/hadoopProgram/jar/elephant-bird-hadoop-compat-4.1.jar'
REGISTER '/home/cloudera/Desktop/hadoopProgram/jar/elephant-bird-pig-4.1.jar'

-- Parsing and loading Company.cfg file 
parsing_company = LOAD 'company' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
flatten_company = FOREACH parsing_company GENERATE FLATTEN($0#'relevanceInfo');
flatten_company_1 = FOREACH flatten_company GENERATE FLATTEN($0) AS mymap;
extract_company_details= FOREACH flatten_company_1 GENERATE mymap#'companyName' as companyName,FLATTEN(mymap#'names') AS mymapNew;
company_results = FOREACH extract_company_details GENERATE companyName,mymapNew#'name' as Names;

-- Parsing and loading AllsightTweets file  
tweets_test = LOAD 'AllsightTweets' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
extract_tweets_details = FOREACH tweets_test GENERATE myMap#'text' as text;

-- Parsing and loading keywords file    
load_keyword = LOAD 'keywords' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
foreach_keywords = FOREACH load_keyword GENERATE FLATTEN($0#'test') AS mymap;
flatten_keywords= FOREACH foreach_keywords GENERATE FLATTEN(mymap#'keywords') AS mymapNew;
result_keywords = FOREACH flatten_keywords GENERATE mymapNew#'keyword' as keyword,mymapNew#'category' as category;

-- Cross product between company and tweet relations    
cross_company_tweet = cross company_results,extract_tweets_details;

-- Cross product between the outcome of first relation with keyword relations
cross_company_tweet_keywords = cross cross_company_tweet,result_keywords;

-- Filter the records where tweet matches with keywords and company 
res = FILTER cross_company_tweet_keywords BY ((text MATCHES CONCAT(CONCAT('.*',company_results::companyName),'.*')) AND (text MATCHES CONCAT(CONCAT('.*',result_keywords::keyword),'.*')));

-- Group the result based on company name and category
res_group = Group res by (company_results::companyName,result_keywords::category);
res_group_count = FOREACH res_group GENERATE FLATTEN(group) as (company_results::companyName,result_keywords::category), COUNT($1);

-- store the result in HDFS 
store res_group_count into 'PigResultLatest' using PigStorage (',');

0 个答案:

没有答案