即使参数相同(连续),函数是否会被多次调用?

时间:2017-09-26 12:26:01

标签: performance azure-data-lake u-sql

假设我有一个名为MyAssembly的程序集,其中MyClass类具有MyFunction(long timestamp)方法(它将日期时间作为字符串返回,格式为:YYYY-MM-DD HH24:mm:ss) 。如果我为这样的工作创建一个脚本:

@outputData =
SELECT MyAssembly.MyClass.MyFunction(t1.timestamp).Substring(0,4) AS Year
      ,MyAssembly.MyClass.MyFunction(t1.timestamp).Substring(...) AS Month
      ,MyAssembly.MyClass.MyFunction(t1.timestamp).Substring(...) AS Day
      ,MyAssembly.MyClass.MyFunction(t1.timestamp).Substring(...) AS Hour
      ,MyAssembly.MyClass.MyFunction(t1.timestamp).Substring(...) AS Minute
      ,MyAssembly.MyClass.MyFunction(t1.timestamp).Substring(...) AS Second
 FROM @queryInput AS t1

是否会多次调用该函数,或者系统是否“足够聪明”只调用一次并使用其他列的返回值?如果不是,我有哪些选择?

1 个答案:

答案 0 :(得分:1)

我不确定ADLA是否“足够聪明”来处理您的情况,但您可以尝试使用custom processor代替。它将为每个处理的行执行一次您的方法。

您的流程方法应该是这样的:

public override IRow Process(IRow input, IUpdatableRow output)
{
     string timestamp = input.Get<string>("timestamp");
     var myFunctionResult = MyAssembly.MyClass.MyFunction(timestamp);

     output.Set<string>("Year", myFunctionResult.Substring(0,4));
     output.Set<string>("Month", myFunctionResult.Substring(...));
     //do this for other fields
     return output.AsReadOnly();
}

在USQL中你的电话应该是这样的:

@outputData = 
    PROCESS @queryInput
    PRODUCE Year string,
         Month string,
         ...
    REQUIRED timestamp
    USING new MyAssembly.MyProcessor();