我正在试图找出他们在猪的文件中每行有多少单词。我已经加载和拆分了:
raw = load file;
words = FOREACH raw GENERATE TOKENIZE(*);
给了我一包每个包含一个单词的tulples。然后我去计算这些项目我得到一个错误:
counts = FOREACH words GENERATE COUNT(*);
我收到错误:
org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while computing count in COUNT
...
Caused by: java.lang.NullPointerException
是因为有些行有空包吗?或者还有别的我做错了吗?
答案 0 :(得分:0)
如果是空袋的问题,那么你可以尝试这样的事情:(未经测试)
raw = load file;
words = FOREACH raw GENERATE TOKENIZE(*) as tokenized_words;
counts = FOREACH words GENERATE ((tokenized_words IS null or TRIM(tokenized_words) == '') ? 0 : COUNT(*)) as total_count;
这里我们正在编写if-else条件来检查tokenized_words是空还是空,如果是,那么我们将零赋值给它,否则就是总计数。
答案 1 :(得分:0)
你能这样试试吗?
<强>输入强>
Hi hello how are you
this is apache pig
works
like a charm
<强> Pigscript:强>
A = LOAD 'input' AS (line:chararray);
B = FOREACH A GENERATE TOKENIZE(line);
C = FOREACH B GENERATE COUNT($0);
DUMP C;
<强>输出:强>
(5)
(4)
(1)
()
(3)