Cpu时间由猪拉丁语中的apache Pig查询占用

时间:2014-11-07 12:13:38

标签: hadoop apache-pig

Apache Pig Query执行了多长时间? 查询是在Pig Latin中获取最多400万个元组(行)的记录,这些元组有43个字段。

A = LOAD '/user/PigTest/year_14/mon_nov/6_sms_03_01.csv' USING PigStorage(',');
bt = foreach A generate $0 as id,$3;
dump bt;
ct = filter bt by id == 3981042 ;
dump ct;
dump MinutesBetween(CurrentTime(),$ti);

并将文件调用为:  pig -param ti ='date'try.pig

我的系统环境是Linux。

错误是: ERROR 1200:输入不匹配'('期待RIGHT_PAREN

org.apache.pig.impl.logicalLayer.FrontendException:错误1000:解析期间出错。不匹配的输入'('期待RIGHT_PAREN         在org.apache.pig.PigServer $ Graph.parseQuery(PigServer.java:1725)         在org.apache.pig.PigServer $ Graph.access $ 000(PigServer.java:1420)         在org.apache.pig.PigServer.parseAndBuild(PigServer.java:364)         在org.apache.pig.PigServer.executeBatch(PigServer.java:389)         在org.apache.pig.PigServer.executeBatch(PigServer.java:375)         在org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:170)         在org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:232)         在org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)         在org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)         在org.apache.pig.Main.run(Main.java:608)         在org.apache.pig.Main.main(Main.java:156)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:606)         在org.apache.hadoop.util.RunJar.main(RunJar.java:212) 引起:无法解析:输入不匹配'('期待RIGHT_PAREN

1 个答案:

答案 0 :(得分:0)

   Two problems here
    1. You should print only the relation in DUMP stmt but you are trying to print the function MinutesBetween().
       If you remove the last line the error will be gone.
    2. In command line you are passing 'date' as parameter. In pig 'date' is not a buildin command. so you need to construct the date atleast any one of the format that pig supports.

    Example:
       I am using this date format '2014-11-06T06:01:13' and more date formats are available in the pig docs. you can check it.

    In command line
    >>pig -param ti='2014-11-06T06:01:13' -f try.pig 

    Change the last line of the pig script like this.
    test = FOREACH ct GENERATE MinutesBetween(CurrentTime(),ToDate('$ti'));
    DUMP test;

<强>更新

创建一个shell脚本,说 test.sh
1.获取当前时间(即start_time)
2.调用猪脚本(try.pig)
3.获取当前时间(即end_time)
4获取时间差异并打印出来,这样您就可以获得猪脚本所需的实际时间。您可以修改脚本以包括小时和毫秒。

<强> test.sh

    #!/bin/bash
    START_TIME=$(date +"%s")

    pig -x local try.pig

    END_TIME=$(date +"%s") 
    DIFF=$(($END_TIME-$START_TIME))
    echo "$(($DIFF / 60)) minutes and $(($DIFF % 60)) seconds."

示例输出:

0 minutes and 2 seconds.