我正在独自学习Pig,在尝试探索数据集时遇到了异常。脚本中有什么问题以及原因:
<section id="about">
<div class="container">
<h1>About</h1>
<p>Lorem ipsum dolor sit amet</p>
<img src="https://picsum.photos/250/250">
</div>
</section>
在MAP Reduce执行结束时,出现以下错误。
movies_data = LOAD '/movies_data' using PigStorage(',') as (id:chararray,title:chararray,year:int,rating:double,duration:double);
high = FILTER movies_data by rating > 4.0;
high_rated = FOREACH high GENERATE movies_data.title,movies_data.year,movies_data.rating,movies_data.duration;
DUMP high_rated;
答案 0 :(得分:1)
首先,让我们看看如何解决您的问题。您无需使用别名访问您的字段。您的第三行可能很简单:
high_rated = FOREACH high GENERATE title, year, rating, duration;
如果出于某种原因要使用别名,则应使用引用运算符(::),如ERROR建议所示。然后您的行将如下所示:
high_rated = FOREACH high GENERATE movies_data::title, movies_data::year, movies_data::rating, movies_data::duration;
接下来,让我们尝试了解错误消息背后的确切原因。当您尝试使用点运算符(。)访问字段时,pig将假定别名为标量(别名只有一行)。由于您的别名有多行,因此抱怨。您可以在这里阅读有关Pig中标量的更多信息:https://issues.apache.org/jira/browse/PIG-1434
在JIRA的发行说明部分中,您会在最后注意到,预期的错误消息与您遇到的错误匹配:
If a relation contains more than single tuple, a runtime error is generated:
"Scalar has more than one row in the output"
答案 1 :(得分:0)
这对您有效,没有错误。
movies_data = LOAD '/movies_data' using PigStorage(',') as (id:chararray,title:chararray,year:int,rating:double,duration:double);
high = FILTER movies_data by rating > 4.0;
high_rated = FOREACH high GENERATE title,year,rating,duration;
DUMP high_rated;
FILTER命令允许所有满足过滤条件的列记录。