无法序列化com.google.api.services.bigquery.Bigquery $ Tables

时间:2017-08-29 14:20:31

标签: google-cloud-dataflow apache-beam

我使用Bigquery,Tables通过将初始化的tableRequest作为参数传递来从DoFn内部获取bigquery表的模式请求,如下所示

private static class FetchSchema extends DoFn<String,List<String>>{
    Bigquery.Tables tableRequest;
    ValueProvider<String> DestTableName;
    ValueProvider<String> mapCols;
    ValueProvider<String> recATableName;

    public FetchSchema(Bigquery.Tables tableReq,ValueProvider<String> table,ValueProvider<String> mCols,ValueProvider<String> recATab){
        this.tableRequest = tableReq;
        this.DestTableName = table;
        this.mapCols = mCols;
        this.recATableName = recATab;
    }
    private List<String> getTableParams(String tableString) throws IOException{
        String[] tableParams = new String[3];
        List<String> tableParamsList = new ArrayList<String>();
        tableParams[0] = tableString.substring(0,tableString.indexOf(":"));
        tableParams[1] = tableString.substring(tableString.indexOf(":")+1,tableString.indexOf("."));
        tableParams[2] = tableString.substring(tableString.indexOf("."));
        Table table = tableRequest.get(tableParams[0],tableParams[1],tableParams[2]).execute();
        List<TableFieldSchema> fields = table.getSchema().getFields();
        for(int i = 0; i < fields.size(); i++){
            tableParamsList.add(fields.get(i).getName());
            tableParamsList.add(fields.get(i).getDescription());
        }
        return tableParamsList;
    }
    @ProcessElement
    public void processElement(ProcessContext c) throws IOException{
        String[] mCols = mapCols.get().split(",");
        List<String> mapColsList = Arrays.asList(mCols);
        c.output(getTableParams(DestTableName.get()));
        c.output(getTableParams(recATableName.get()));
        c.output(mapColsList);
    }
}

但是我收到了这个错误:

An exception occured while executing the Java class. null: InvocationTargetException: unable to serialize org.apache.beam.examples.flatFileTest$FetchSchema@6510b00e: com.google.api.services.bigquery.Bigquery$Tables

请帮忙吗?

1 个答案:

答案 0 :(得分:0)

在本地计算机上创建的BigQuery客户端对于使用Dataflow执行管道的所有工作人员都没有用。相反,您应该在DoFn的BigQuery.Tables方法中创建@StartBundle客户端。此方法可以采用StartBundleContext参数,允许调用getPipelineOptions()

注意:理想情况下,这可能是@Setup方法,因此客户端可以跨包重用,但似乎管道选项似乎不可用。