数组索引74超出范围(溢出)Google大查询

时间:2019-02-06 04:18:59

标签: google-bigquery

在Big Query中,我遇到了将以竖线(|)分隔的文件移至Google Big Query的问题。为了完成这项工作,我决定将每行整体作为一个字符串引入,然后执行SPLIT函数以在所需的数据类型中获取所需的字段。

在这种情况下,我有以下SELECT语句:

Select
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(0)] as INT64) AS FIELD1
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(1)] AS FIELD2
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(2)] AS FIELD3
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(3)] AS FIELD4
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(4)] AS FIELD5
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(5)] AS FIELD6
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(6)] as timestamp) AS FIELD7
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(7)] AS FIELD8
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(8)] AS FIELD9
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(9)] AS FIELD10
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(10)] AS FIELD11
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(11)] AS FIELD12
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(12)] AS FIELD13
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(13)] AS FIELD14
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(14)] AS FIELD15
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(15)] AS FIELD16
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(16)] AS FIELD17
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(17)] AS FIELD18
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(18)] AS FIELD19
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(19)] AS FIELD20
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(20)] AS FIELD21
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(21)] AS FIELD22
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(22)] AS FIELD23
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(23)] AS FIELD24
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(24)] AS FIELD25
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(25)] AS FIELD26
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(26)] AS FIELD27
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(27)] as INT64) AS FIELD28
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(28)] AS FIELD29
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(29)] AS FIELD30
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(30)] AS FIELD31
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(31)] as timestamp) AS FIELD32
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(32)] as INT64) AS FIELD33
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(33)] as INT64) AS FIELD34
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(34)] as timestamp) AS FIELD35
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(35)] as INT64) AS FIELD36
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(36)] as INT64) AS FIELD37
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(37)] as timestamp) AS FIELD38
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(38)] as numeric) AS FIELD39
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(39)] as INT64) AS FIELD40
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(40)] as INT64) AS FIELD41
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(41)] AS FIELD42
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(42)] AS FIELD43
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(43)] as INT64) AS FIELD44
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(44)] as numeric) AS FIELD45
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(45)] AS FIELD46
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(46)] as timestamp) AS FIELD47
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(47)] AS FIELD48
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(48)] as INT64) AS FIELD49
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(49)] as timestamp) AS FIELD50
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(50)] as INT64) AS FIELD51
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(51)] as INT64) AS FIELD52
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(52)] as INT64) AS FIELD53
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(53)] as INT64) AS FIELD54
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(54)] as timestamp) AS FIELD55
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(55)] as INT64) AS FIELD56
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(56)] as timestamp) AS FIELD57
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(57)] AS FIELD58
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(58)] AS FIELD59
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(59)] AS FIELD60
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(60)] AS FIELD61
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(61)] AS FIELD62
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(62)] AS FIELD63
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(63)] as FLOAT64) AS FIELD64
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(64)] as FLOAT64) AS FIELD65
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(65)] AS FIELD66
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(66)] as INT64) AS FIELD67
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(67)] as INT64) AS FIELD68
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(68)] as INT64) AS FIELD69
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(69)] AS FIELD70
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(70)] AS FIELD71
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(71)] AS FIELD72
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(72)] as timestamp) AS FIELD73
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(73)] as timestamp) AS FIELD74
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(74)] as timestamp) AS FIELD75
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(75)] as INT64) AS FIELD76
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(76)] as INT64) AS FIELD77
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(77)] as INT64) AS FIELD78
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(78)] as timestamp) AS FIELD79
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(79)] as timestamp) AS FIELD80
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(80)] as timestamp) AS FIELD81
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(81)] as timestamp) AS FIELD82
SPLIT(RECORDSTRING, '|"||"|')[OFFSET(82)] AS FIELD83
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(83)] as FLOAT64) AS FIELD84
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(84)] as FLOAT64) AS FIELD85
safe_cast(SPLIT(RECORDSTRING, '|"||"|')[OFFSET(85)] as FLOAT64) AS FIELD86
from `db.ds.tbl`

GBQ指出此查询在结构上是合理的。但显示以下响应:

Array index 74 is out of bounds (overflow) google big query

我该如何回应? 谢谢

只是要清楚。分隔符为|"||"|。例如:

6|"||"|Demo (60 Minutes) |"||"|demo60min|"||"|2017-09-13 16:30:00|"||"|2017-09-13 15:30:00

谢谢

1 个答案:

答案 0 :(得分:1)

  

数组索引74超出范围(溢出)Google大查询
  我该如何回应?

不是像下面这样在行中使用[OFFSET(NN)]

SPLIT(RECORDSTRING, '|"||"|')[OFFSET(NN)] 

您应按照以下示例使用[SAFE_OFFSET(NN)]

SPLIT(RECORDSTRING, '|"||"|')[SAFE_OFFSET(NN)]    
  

额外更新

如您所见-您的查询非常庞大,难以阅读和维护

下面是如何将查询转换为紧凑查询的简化示例

#standardSQL
SELECT 
  SAFE_CAST(item[SAFE_OFFSET(0)] AS INT64) AS FIELD1,
  item[SAFE_OFFSET(1)] AS FIELD2,
  item[SAFE_OFFSET(2)] AS FIELD3,
  item[SAFE_OFFSET(3)] AS FIELD4,
  item[SAFE_OFFSET(4)] AS FIELD5,
  item[SAFE_OFFSET(5)] AS FIELD6,
  SAFE_CAST(item[SAFE_OFFSET(6)] AS TIMESTAMP) AS FIELD7
FROM `project.dataset.table` t,
UNNEST([STRUCT<item ARRAY<STRING>>(SPLIT(RECORDSTRING, '|"||"|'))])

您可以使用问题中的示例数据对其进行测试,如下例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '''6|"||"|Demo (60 Minutes) |"||"|demo60min|"||"|2017-09-13 16:30:00|"||"|2017-09-13 15:30:00''' AS RECORDSTRING
)
SELECT 
  SAFE_CAST(item[SAFE_OFFSET(0)] AS INT64) AS FIELD1,
  item[SAFE_OFFSET(1)] AS FIELD2,
  item[SAFE_OFFSET(2)] AS FIELD3,
  item[SAFE_OFFSET(3)] AS FIELD4,
  item[SAFE_OFFSET(4)] AS FIELD5,
  item[SAFE_OFFSET(5)] AS FIELD6,
  SAFE_CAST(item[SAFE_OFFSET(6)] AS TIMESTAMP) AS FIELD7
FROM `project.dataset.table` t,
UNNEST([STRUCT<item ARRAY<STRING>>(SPLIT(RECORDSTRING, '|"||"|'))])

有结果

Row FIELD1  FIELD2              FIELD3      FIELD4              FIELD5                 FIELD6      FIELD7    
1   6       Demo (60 Minutes)   demo60min   2017-09-13 16:30:00 2017-09-13 15:30:00    null        null