猪压扁错误

时间:2014-08-15 20:13:51

标签: hadoop apache-pig flatten cloudera-cdh

我为嵌套数据尝试了这个脚本:

 `books = load 'data/book-seded-workings-reduced.json'
    using JsonLoader('user_id:chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,authors:{(name:chararray)},source:chararray');`

group_auth = group books by title;

maped = foreach group_auth generate group, books.authors;

fil = foreach maped generate flatten(books); DUMP fil;

但是我收到了这个错误:需要从一个关系中投射一个列,以便将其用作标量

有什么想法吗?

1 个答案:

答案 0 :(得分:2)

books = load 'input.data'
    using JsonLoader('user_id:chararray,
                      type:chararray,
                      title:chararray,
                      year:chararray,
                      publisher:chararray,
                      authors:{(name:chararray)},source:chararray');

flatten_authors = foreach books generate title, FLATTEN(authors.name);

dump flatten_authors;

输出:(来自Loading JSON file with serde in Cloudera的输入)

(Modern Database Systems: The Object Model, Interoperability, and Beyond.,null)
(Inequalities: Theory of Majorization and Its Application.,Albert W. Marshall)
(Inequalities: Theory of Majorization and Its Application.,Ingram Olkin)