我为嵌套数据尝试了这个脚本:
`books = load 'data/book-seded-workings-reduced.json'
using JsonLoader('user_id:chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,authors:{(name:chararray)},source:chararray');`
group_auth = group books by title;
maped = foreach group_auth generate group, books.authors;
fil = foreach maped generate flatten(books);
DUMP fil;
但是我收到了这个错误:需要从一个关系中投射一个列,以便将其用作标量
有什么想法吗?
答案 0 :(得分:2)
books = load 'input.data'
using JsonLoader('user_id:chararray,
type:chararray,
title:chararray,
year:chararray,
publisher:chararray,
authors:{(name:chararray)},source:chararray');
flatten_authors = foreach books generate title, FLATTEN(authors.name);
dump flatten_authors;
输出:(来自Loading JSON file with serde in Cloudera的输入)
(Modern Database Systems: The Object Model, Interoperability, and Beyond.,null)
(Inequalities: Theory of Majorization and Its Application.,Albert W. Marshall)
(Inequalities: Theory of Majorization and Its Application.,Ingram Olkin)