我按照以下步骤zeropad.py我的python脚本
!/usr/bin/python
from org.apache.pig.scripting import *
@outputSchema('time:int')
def zero():
time.zfill(4)
=======================================
grunt> REGISTER' zeropad.py'使用org.apache.pig.scripting.jython.JythonScriptEngine作为myfuncs;
==============================
Airlines_data_schema = LOAD 'AirlinesData_sample-1.csv' USING PigStorage('\t') AS (Year,Month,DayofMonth,DayofWeek,DepTime_actual:int,CRSDeptime:int,Arrtime_actual:int,CRSArrtime:int,UniqueCarrier,FlightNum,TailNum_Plane,ActualElapsedTime,CRSElapsedTime,Airtime,Arrdelay,Depdelay,Origin,Dest,Distance,Taxiin,Taxiout,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay);
=============================================== ====
airlines_new = FOREACH Airlines_data_schema GENERATE Year,Month,DayofMonth,DayofWeek,myfuncs.zero.DepTime_actual AS DepTime_actual_new,myfuncs.zero.CRSDeptime AS CRSDeptime_new,myfuncs.zero.Arrtime_actual AS Arrtime_actual_new,myfuncs.zero.CRSArrtime AS CRSArrtime_new,UniqueCarrier,FlightNum,TailNum_Plane,ActualElapsedTime,CRSElapsedTime,Airtime,Arrdelay,Depdelay,Origin,Dest,Distance,Taxiin,Taxiout,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay ;
我收到以下错误
2017-02-26 19:37:19,606 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
无效的字段投影。模式中不存在投影字段[myfuncs]:Year:bytearray,Month:bytearray,DayofMonth:bytearray,DayofWeek:bytearray,DepTime_actual:int,CRSDeptime:int,Arrtime_actual:int,CRSArrtime:int,UniqueCarrier:bytearray,FlightNum:bytearray ,TailNum_Plane:字节组,ActualElapsedTime:字节组,CRSElapsedTime:ByteArray的,通话时间:字节组,Arrdelay:字节组,Depdelay:字节组,产地:ByteArray的,目的地:字节组,距离:字节组,Taxiin:字节组,Taxiout:ByteArray的,取消:字节组,CancellationCode :字节组,改行:字节组,CarrierDelay:字节组,WeatherDelay:字节组,NASDelay:字节组,SecurityDelay:字节组,LateAircraftDelay:字节组
想知道为什么我无法使用我的python函数来操纵我的列值
答案 0 :(得分:0)
尝试使用以下语法:
airlines_new = FOREACH Airlines_data_schema GENERATE Year,Month,DayofMonth,DayofWeek, myfuncs.zero(DepTime_actual) AS DepTime_actual_new,myfuncs.zero.CRSDeptime AS CRSDeptime_new,myfuncs.zero.Arrtime_actual AS Arrtime_actual_new,myfuncs.zero.CRSArrtime AS CRSArrtime_new,UniqueCarrier,FlightNum,TailNum_Plane,ActualElapsedTime,CRSElapsedTime,Airtime,Arrdelay,Depdelay,Origin,Dest,Distance,Taxiin,Taxiout,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay ;
答案 1 :(得分:0)
搞定了!通过下面的小修正
#!/usr/bin/python
@outputSchema("num:int")
def zero(time):
return time.zfill(4);
REGISTER '/home/Jig13517/zeropad.py' using jython AS func ;
airlines_new = FOREACH Airlines_data_schema GENERATE Year,Month,DayofMonth,DayofWeek,func.zero(Airlines_data_schema.DepTime_actual) AS DepTime_actual_new:int,func.zero(Airlines_data_schema.CRSDeptime) AS CRSDeptime_new:int,func.zero(Airlines_data_schema.Arrtime_actual) AS Arrtime_actual_new:int,func.zero(Airlines_data_schema.CRSArrtime) AS CRSArrtime_new:int,UniqueCarrier,FlightNum,TailNum_Plane,ActualElapsedTime,CRSElapsedTime,Airtime,Arrdelay,Depdelay,Origin,Dest,Distance,Taxiin,Taxiout,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay ;