使用PIG计算avro文件中的记录数

时间:2015-10-29 19:19:28

标签: hadoop apache-pig avro

我可以在HUE中打开一个avro文件,HUE告诉我它有10条记录。我可以浏览HUE中的所有10条记录。

现在我在PIG中编写以下代码

data = LOAD '/user/admin/2015/10/04/02/file1.avro' USING AvroStorage();
data_group = GROUP data ALL;
row_count = FOREACH data_group GENERATE COUNT(data);
dump row_count;

作业的输出是

Input(s):
Successfully read 4 records (58507 bytes) from: "/user/admin/2015/10/04/02/file1.avro"

Output(s):
Successfully stored 1 records (6 bytes) in: "hdfs://nn1/tmp/temp-268177355/tmp915757783"

Counters:
Total records written : 1
Total bytes written : 6
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1438959478020_940907


2015-10-29 19:08:55,252 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2015-10-29 19:08:55,252 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-10-29 19:08:55,253 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2015-10-29 19:08:55,261 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-10-29 19:08:55,261 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(4)

10如何变成4.使用PIG计算avro文件中的记录数有不同的方法吗?

0 个答案:

没有答案