这是表的记录布局(load_history)我试图使用标准Sql过滤器(因为遗留的sql可能会在某些时候过时):
[
{
"mode": "NULLABLE",
"name": "Job",
"type": "RECORD",
"fields": [
{
"mode": "NULLABLE",
"name": "name",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "start_time",
"type": "TIMESTAMP"
},
{
"mode": "NULLABLE",
"name": "end_time",
"type": "TIMESTAMP"
},
{
]
},
{
"mode": "REPEATED",
"name": "source",
"type": "RECORD",
"description": "source tables touched by this job",
"fields": [
{
"mode": "NULLABLE",
"name": "database",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "schema",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "table",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "partition_time",
"type": "TIMESTAMP"
}
]
}
]
我需要过滤并选择只有数组“source”中有条目的记录,其中“schema”& “table”字段匹配某些值(例如,schema ='log'和table ='customer'在同一个数组条目中)。
以下仅在过滤Struct(模式名称)中的一个字段时起作用:
select name, array(select x from unnest(schema) as x where x ='log' ), table
from (select job.name , array(select schema from unnest(source)) as schema,
array(select table from unnest(source)) as table
from config.load_history)
但是,我无法在同一数组条目中过滤字段组合。
非常感谢您的帮助
答案 0 :(得分:4)
for BigQuery Standard SQL
#standardSQL
SELECT data
FROM data, UNNEST(source) AS s
WHERE (s.schema, s.table) = ('log', 'customer')
或
#standardSQL
SELECT *
FROM data
WHERE EXISTS (
SELECT 1 FROM UNNEST(source) AS s
WHERE (s.schema, s.table) = ('log', 'customer')
)
您可以使用以下虚拟数据进行测试/播放
#standardSQL
WITH data AS (
SELECT
STRUCT<name STRING, start_time INT64, end_time INT64>('jobA', 1, 2) AS job,
[STRUCT<database STRING, schema STRING, table STRING, partition_time INT64>
('d1', 's1', 't1', 1),
('d1', 's2', 't2', 2),
('d1', 's3', 't3', 3)
] AS source UNION ALL
SELECT
STRUCT<name STRING, start_time INT64, end_time INT64>('jobB', 1, 2) AS job,
[STRUCT<database STRING, schema STRING, table STRING, partition_time INT64>
('d1', 's1', 't1', 1),
('d2', 's4', 't2', 2),
('d2', 's3', 't3', 3)
] AS source
)
SELECT *
FROM data
WHERE EXISTS (
SELECT 1 FROM UNNEST(source) AS s
WHERE (s.schema, s.table) = ('s2', 't2')
)
答案 1 :(得分:1)
听起来你想要这样的东西:
SELECT
job.name,
ARRAY(SELECT schema FROM UNNEST(matching_sources)) AS matching_schemas,
ARRAY(SELECT table FROM UNNEST(matching_sources)) AS matching_tables
FROM (
SELECT *,
ARRAY(SELECT AS STRUCT * FROM UNNEST(sources)
WHERE schema = 'log' AND `table` = 'customer') AS matching_sources
FROM YourTable
)
WHERE ARRAY_LENGTH(matching_sources) > 0;
这将返回一个模式数组和一个表数组,两者都匹配条件,并排除数组中没有条目匹配条件的行。
答案 2 :(得分:0)
我需要过滤并选择只有数组“source”中有条目的记录,其中“schema”&amp; “table”字段匹配某些值
这听起来好像可以通过一个简单的WHERE
子句来解决,如下所示:
WITH data AS(
select STRUCT<name STRING, start_time TIMESTAMP, end_time TIMESTAMP> ('job_1', TIMESTAMP("2017-06-10"), TIMESTAMP("2017-06-11")) Job, ARRAY<STRUCT<database STRING, schema STRING, table STRING, partition_time TIMESTAMP> > [STRUCT('database_1', "schema_1", "table_1", TIMESTAMP("2017-06-10")), STRUCT('database_1', "schema_1", "table_2", TIMESTAMP("2017-06-10")), STRUCT('database_1', "schema_3", "table_1", TIMESTAMP("2017-06-10")), STRUCT('database_2', "schema_2", "table_2", TIMESTAMP("2017-06-10"))] source union all
select STRUCT<name STRING, start_time TIMESTAMP, end_time TIMESTAMP> ('job_2', TIMESTAMP("2017-06-10"), TIMESTAMP("2017-06-11")) Job, ARRAY<STRUCT<database STRING, schema STRING, table STRING, partition_time TIMESTAMP> > [STRUCT('database_2', "schema_2", "table_2", TIMESTAMP("2017-06-10")), STRUCT('database_2', "schema_2", "table_3", TIMESTAMP("2017-06-10")), STRUCT('database_1', "schema_1", "table_3", TIMESTAMP("2017-06-10"))] source
)
SELECT
*
FROM data
WHERE EXISTS(SELECT 1 FROM UNNEST(source) WHERE schema = "schema_2" AND table = "table_2")
这将返回所有行,在某些时候,这些行具有给定的模式和给定的表。
如果您想在输出中仅过滤掉匹配过滤器的记录,您也可以运行此命令:
SELECT
job.*,
ARRAY(SELECT AS STRUCT database, schema, table, partition_time FROM UNNEST(source) WHERE schema = "schema_2" AND table = "table_2") filtered_data
FROM data
WHERE EXISTS(SELECT 1 FROM UNNEST(source) WHERE schema = "schema_2" AND table = "table_2")
不确定这是否与您的问题完全相符,但它可能会让您了解如何从ARRAY中过滤掉值。
答案 3 :(得分:0)
Mikhail-berlyant https://stackoverflow.com/users/5221944/mikhail-berlyant对此做了很好的解释 我用了第一个例子。
SELECT data
FROM data, UNNEST(source) AS s
WHERE (s.schema, s.table) = ('log', 'customer')
让我在我的例子中解释一下: 如果我想从Google公共专利中获取具有具体每次点击费用代码的完全匹配行
通常情况下,我会使用“赞”条件
SELECT cpc
FROM
`patents-public-data.patents.publications`
where cpc like "%G01R31/007"
我不能以此为目的,因为CPC单元格包含一个数组列表[{'code':'G01R31 / 007','inventive':True,'first':False,'tree':[] }]
所以我需要将此数组划分为多个块,然后我要寻址到 code 标识符,并将查询与要提取的确切值等同起来-可能是 G01R31 / 007
以下代码:
SELECT publication_number, cpc
FROM `patents-public-data.patents.publications`,
UNNEST(cpc) AS s
WHERE (s.code) = ('G01R31/007')