如何在谷歌云数据流中运行动态第二查询?

时间:2018-12-10 21:31:06

标签: google-bigquery google-cloud-dataflow apache-beam

我正在尝试执行以下操作:通过查询获取ID列表,将其转换为以逗号分隔的字符串(即“ 1,2,3”),然后在辅助查询中使用它。尝试运行第二个查询时,出现语法错误:

“ lambda转换的目标类型必须是接口”

String query = "SELECT DISTINCT campaignId FROM `" + options.getEligibilityInputTable() + "` ";

    Pipeline p = Pipeline.create(options);
    p.apply("GetCampaignIds", BigQueryIO.readTableRows().withTemplateCompatibility().fromQuery(query).usingStandardSql())
      .apply("TransformCampaignIds",
        MapElements.into(TypeDescriptors.strings())
        .via((TableRow row) -> (String)row.get("campaignId")))
      .apply(Combine.globally(new StringToCsvCombineFn()))
      .apply("GetAllCampaigns", campaignIds -> BigQueryIO.readTableRows().withTemplateCompatibility().fromQuery("SELECT id AS campaignId, dataQuery FROM `{projectid}.mysql_standard.campaigns` WHERE campaignId IN (" + campaignIds + ")").usingStandardSql())
....

如何将查询链接在一起?

1 个答案:

答案 0 :(得分:1)

不幸的是,您不能使用现有来源执行此操作。您有两种选择:

  • 您从ParDo手动调用BQ API。
  • 您编写了一个复杂的SQL查询来为您执行此操作。

第二个选项看起来像这样:

String query = "SELECT id AS campaignId, dataQuery \
               FROM `{projectid}.mysql_standard.campaigns` \
               WHERE campaignId IN ( \
                   SELECT DISTINCT campaignId \
                   FROM `" + options.getEligibilityInputTable() 
                   + "`)";

Pipeline p = Pipeline.create(options);
p.apply("GetAllCampaigns", BigQueryIO.readTableRows()
                                     .withTemplateCompatibility()
                                     .fromQuery(query)
                                     .usingStandardSql());