Hive playing with ARRAY/STRUCT

时间:2018-02-26 17:36:29

标签: hadoop hive user-defined-functions

I am using the IMDB data set for one of my POC.

Data is available here

One of the sample data is like

nm0000006   Ingrid Bergman  1915    1982    actress,soundtrack,producer tt0038109,tt0071877,tt0034583,tt0038787
nm0000007   Humphrey Bogart 1899    1957    actor,soundtrack,producer   tt0033870,tt0038355,tt0034583,tt0040897
nm0000008   Marlon Brando   1924    2004    actor,soundtrack,director   tt0068646,tt0047296,tt0078346,tt0078788
nm0000009   Richard Burton  1925    1984    actor,producer,soundtrack   tt0057877,tt0061184,tt0065207,tt0087803
nm0000010   James Cagney    1899    1986    actor,soundtrack,director   tt0042041,tt0029870,tt0055256,tt0035575
nm0000011   Gary Cooper 1901    1961    actor,soundtrack,producer   tt0044706,tt0049233,tt0033891,tt0027996

The table I have created is as

Create external table casts( id STRING, name STRING, birthYear INT,deathYear INT, profession ARRAY<STRING>,titles ARRAY<STRING>) row format delimited fields terminated by '\t' lines terminated by '\n'  tblproperties ("skip.header.line.count"="1");

I want to run a query like who were the actors for a particular movie title(say tt0057877).

I also have another sample data like

 tconst averageRating   numVotes
tt0000001   5.8 1347
tt0000002   6.5 156
tt0000003   6.6 929
tt0000004   6.4 93
tt0000005   6.2 1613



I also want to run  query like , show top 10 actors , who took part as an actor in the top rated movies.

Is there a way to do the above in hive( preferably without UDF)..

Thanks !

0 个答案:

没有答案