我正在尝试执行放在HDFS
中的猪脚本。我收到了一个错误。
ERROR 2999: Unexpected internal error. null
java.lang.NullPointerException
at org.apache.pig.impl.io.FileLocalizer.fetchFilesInternal(FileLocalizer.java:734)
at org.apache.pig.impl.io.FileLocalizer.fetchFiles(FileLocalizer.java:699)
at org.apache.pig.PigServer.registerJar(PigServer.java:522)
at org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:473)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:546)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:613)
at org.apache.pig.Main.main(Main.java:158)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
但是如果我在grunt shell中按语句执行相同的脚本语句,那么它可以正常工作。以下命令用于执行脚本。
pig -x mapreduce hdfs://quickstart.cloudera:8020/user/cloudera/oozie/PigScript/tweetAnalysis.pig
这是脚本
--REGISTER required Json Jars for parsing Json
REGISTER '/usr/lib/pig/lib/json-simple-1.1.jar'
REGISTER '/home/cloudera/Desktop/hadoopProgram/jar/elephant-bird-hadoop-compat-4.1.jar'
REGISTER '/home/cloudera/Desktop/hadoopProgram/jar/elephant-bird-pig-4.1.jar'
-- Parsing and loading Company.cfg file
parsing_company = LOAD 'company' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
flatten_company = FOREACH parsing_company GENERATE FLATTEN($0#'relevanceInfo');
flatten_company_1 = FOREACH flatten_company GENERATE FLATTEN($0) AS mymap;
extract_company_details= FOREACH flatten_company_1 GENERATE mymap#'companyName' as companyName,FLATTEN(mymap#'names') AS mymapNew;
company_results = FOREACH extract_company_details GENERATE companyName,mymapNew#'name' as Names;
-- Parsing and loading AllsightTweets file
tweets_test = LOAD 'AllsightTweets' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap;
extract_tweets_details = FOREACH tweets_test GENERATE myMap#'text' as text;
-- Parsing and loading keywords file
load_keyword = LOAD 'keywords' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad');
foreach_keywords = FOREACH load_keyword GENERATE FLATTEN($0#'test') AS mymap;
flatten_keywords= FOREACH foreach_keywords GENERATE FLATTEN(mymap#'keywords') AS mymapNew;
result_keywords = FOREACH flatten_keywords GENERATE mymapNew#'keyword' as keyword,mymapNew#'category' as category;
-- Cross product between company and tweet relations
cross_company_tweet = cross company_results,extract_tweets_details;
-- Cross product between the outcome of first relation with keyword relations
cross_company_tweet_keywords = cross cross_company_tweet,result_keywords;
-- Filter the records where tweet matches with keywords and company
res = FILTER cross_company_tweet_keywords BY ((text MATCHES CONCAT(CONCAT('.*',company_results::companyName),'.*')) AND (text MATCHES CONCAT(CONCAT('.*',result_keywords::keyword),'.*')));
-- Group the result based on company name and category
res_group = Group res by (company_results::companyName,result_keywords::category);
res_group_count = FOREACH res_group GENERATE FLATTEN(group) as (company_results::companyName,result_keywords::category), COUNT($1);
-- store the result in HDFS
store res_group_count into 'PigResultLatest' using PigStorage (',');