在批处理执行中提取聚合器值

时间:2016-07-28 20:32:08

标签: java batch-processing google-cloud-dataflow

在Dataflow批量执行后,有没有办法以编程方式提取聚合器的最终值?

基于DirectePipelineRunner类,我编写了以下方法。它似乎工作,但对于dinamically创建的计数器,它给出的值不同于控制台输出中显示的值。

PS。如果它有帮助,我假设聚合器基于Long值,具有求和功能。

public static Map<String, Object> extractAllCounters(Pipeline p, PipelineResult pr)
{
    AggregatorPipelineExtractor aggregatorExtractor = new AggregatorPipelineExtractor(p);
    Map<String, Object> results = new HashMap<>();

    for (Map.Entry<Aggregator<?, ?>, Collection<PTransform<?, ?>>> e :
            aggregatorExtractor.getAggregatorSteps().entrySet()) {
        Aggregator agg = e.getKey();
        try {
            results.put(agg.getName(), pr.getAggregatorValues(agg).getTotalValue(agg.getCombineFn()));
        } catch(AggregatorRetrievalException|IllegalArgumentException aggEx) {
            //System.err.println("Can't extract " + agg.getName() + ": " + aggEx.getMessage());
        }
    }

    return results;
}

1 个答案:

答案 0 :(得分:2)

聚合器的值应该在namespace ScriptingAssemblyReuse { public class Globals { public IFactory Factory { get; set; } } public interface IFactory { object Get(); } public class Factory<T> : IFactory where T : new() { public object Get() => new T(); } public class Program { public static void Main(string[] args) { new Program().Run(); } private Assembly _scriptAssembly; public void Run() { AppDomain.CurrentDomain.AssemblyResolve += OnAssemblyResolve; // Create the script Script<object> script = CSharpScript.Create(@" public class Foo { } Factory = new ScriptingAssemblyReuse.Factory<Foo>(); ", ScriptOptions.Default.WithReferences(MetadataReference.CreateFromFile(typeof(IFactory).Assembly.Location)), typeof(Globals)); // Create a compilation and get the dynamic assembly CSharpCompilationOptions scriptCompilationOptions = new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary); Compilation scriptCompilation = script.GetCompilation().WithOptions(scriptCompilationOptions); byte[] scriptAssemblyBytes; using (MemoryStream ms = new MemoryStream()) { EmitResult result = scriptCompilation.Emit(ms); ms.Seek(0, SeekOrigin.Begin); scriptAssemblyBytes = ms.ToArray(); } _scriptAssembly = Assembly.Load(scriptAssemblyBytes); // Evaluate the script Globals globals = new Globals(); script.RunAsync(globals).Wait(); // Create the consuming compilation string assemblyName = Path.GetRandomFileName(); CSharpParseOptions parseOptions = new CSharpParseOptions(); SyntaxTree syntaxTree = CSharpSyntaxTree.ParseText(@" public class Bar { public void Baz(object obj) { Script.Foo foo = (Script.Foo)obj; // This is the line that triggers the exception } }", parseOptions, assemblyName); CSharpCompilationOptions compilationOptions = new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary); string assemblyPath = Path.GetDirectoryName(typeof(object).Assembly.Location); CSharpCompilation compilation = CSharpCompilation.Create(assemblyName, new[] {syntaxTree}, new[] { MetadataReference.CreateFromFile(Path.Combine(assemblyPath, "mscorlib.dll")), MetadataReference.CreateFromFile(Path.Combine(assemblyPath, "System.dll")), MetadataReference.CreateFromFile(Path.Combine(assemblyPath, "System.Core.dll")), MetadataReference.CreateFromFile(Path.Combine(assemblyPath, "System.Runtime.dll")) }, compilationOptions); using (MemoryStream ms = new MemoryStream(scriptAssemblyBytes)) { compilation = compilation.AddReferences(MetadataReference.CreateFromStream(ms)); } // Get the consuming assembly Assembly assembly; using (MemoryStream ms = new MemoryStream()) { EmitResult result = compilation.Emit(ms); ms.Seek(0, SeekOrigin.Begin); byte[] assemblyBytes = ms.ToArray(); assembly = Assembly.Load(assemblyBytes); } // Call the consuming assembly Type barType = assembly.GetExportedTypes().First(t => t.Name.StartsWith("Bar", StringComparison.Ordinal)); MethodInfo bazMethod = barType.GetMethod("Baz"); object bar = Activator.CreateInstance(barType); object obj = globals.Factory.Get(); bazMethod.Invoke(bar, new []{ obj }); // The exception bubbles up and gets thrown here } private Assembly OnAssemblyResolve(object sender, ResolveEventArgs args) { if (_scriptAssembly != null && args.Name == _scriptAssembly.FullName) { // Return the dynamically compiled script assembly if given it's name return _scriptAssembly; } return null; } } } 中可用。例如:

PipelineResult

报告聚合器的示例CountOddsFn countOdds = new CountOddsFn(); pipeline .apply(Create.of(1, 3, 5, 7, 2, 4, 6, 8, 10, 12, 14, 20, 42, 68, 100)) .apply(ParDo.of(countOdds)); PipelineResult result = pipeline.run(); // Here you may need to use the BlockingDataflowPipelineRunner AggregatorValues<Integer> values = result.getAggregatorValues(countOdds.aggregator); Map<String, Integer> valuesAtSteps = values.getValuesAtSteps(); // Now read the values from the step...

DoFn