Redshift中的“无效的json数组对象”,即使可以

时间:2018-12-28 15:42:21

标签: sql json amazon-redshift

为了设置监视,我试图获取特定文件的调用持续时间,将这些数据聚合并将其提供给grafana,但是即使我仔细检查了所有内容,我的查询仍会由于一个奇怪的原因而失败。可以。

在我的redshift数据库中,我有一个传说中的prodloadtimes,其中resource列是该数组中所有调用(网络日志)的JSON数组,我得到了我选择的基于此代码的调用:

              select created,
                    useragent,
                    position('"pathname":"/api/player/screenName' in resource) as pos1,
                    right( resource, len(resource) - pos1) as pathname_start_string,
                    position(',{' in pathname_start_string) as pos2,
                    left (pathname_start_string, pos2-1) as pathname_tail,
                    concat(left (resource, pos1 + pos2 - 1), ']') as string3,
                    json_array_length(string3) as arr_len,
                    json_extract_array_element_text ( string3  , arr_len-1) as commons_element,
                    is_valid_json(commons_element) as "test"

                  from prodloadtimes
                  where resource like '%"pathname":"/api/player/screenName"%'
                    and created between '2018-12-01' and '2018-12-18 12:00'
                    and pos2 > 0

因此,基本上,我在数组中找到了一个需要的元素,切出了所有接下来的元素,所以我需要的是最后一个元素,然后使用函数json_extract_array_element_text (string3, arr_len-1)

提取该元素。

现在,我使用以下代码提取所需的数据:

  select created,
         useragent,
         json_extract_path_text(commons_element,'duration') as duration,
         json_extract_path_text(commons_element,'transferSize') as transfersize,
         json_extract_path_text(commons_element,'name', 'pathname') as pathname,
         json_extract_path_text(commons_element,'encodedBodySize') as size,
         commons_element,
         is_valid_json(commons_element) as "test"
      from (
              select created,
                    useragent,
                    position('"pathname":"/api/player/screenName' in resource) as pos1,
                    right( resource, len(resource) - pos1) as pathname_start_string,
                    position(',{' in pathname_start_string) as pos2,
                    left (pathname_start_string, pos2-1) as pathname_tail,
                    concat(left (resource, pos1 + pos2 - 1), ']') as string3,
                    json_array_length(string3) as arr_len,
                    json_extract_array_element_text ( string3  , arr_len-1) as commons_element,
                    is_valid_json(commons_element) as "test"

                  from prodloadtimes
                  where resource like '%"pathname":"/api/player/screenName"%'
                    and created between '2018-12-01' and '2018-12-18 12:00'
                    and pos2 > 0
                    )   
        where transfersize > 0 
          and duration > 0)

它工作得很好,但是当我尝试使用此方法对数据进行百分位数时:

select floor(extract(epoch from created)/900)*900 AS "time",
  percentile_cont(0.5) within group (order by cast(duration as float4) asc) AS "50th",
  percentile_cont(0.75) within group (order by cast(duration as float4) asc) AS "75th",
  percentile_cont(0.95) within group (order by cast(duration as float4) asc) AS "95th"
  from (
  select created,
         useragent,
         json_extract_path_text(commons_element,'duration') as duration,
         json_extract_path_text(commons_element,'transferSize') as transfersize,
         json_extract_path_text(commons_element,'name', 'pathname') as pathname,
         json_extract_path_text(commons_element,'encodedBodySize') as size,
         commons_element,
         is_valid_json(commons_element) as "test"
      from (
              select created,
                    useragent,
                    position('"pathname":"/api/player/screenName' in resource) as pos1,
                    right( resource, len(resource) - pos1) as pathname_start_string,
                    position(',{' in pathname_start_string) as pos2,
                    left (pathname_start_string, pos2-1) as pathname_tail,
                    concat(left (resource, pos1 + pos2 - 1), ']') as string3,
                    json_array_length(string3) as arr_len,
                    json_extract_array_element_text ( string3  , arr_len-1) as commons_element,
                    is_valid_json(commons_element) as "test"

                  from prodloadtimes
                  where resource like '%"pathname":"/api/player/screenName"%'
                    and created between '2018-12-01' and '2018-12-18 12:00'
                    and pos2 > 0
                    )   
        where transfersize > 0 
          and duration > 0)
  group by "time"

Redshift响应错误

Amazon无效操作:JSON解析错误 详细信息:


错误:JSON解析错误   码:8001   上下文:无效的json数组对象

鉴于此commons_element是内置函数json_extract_array_element_text的输出,因此我看不到它如何成为无效的JSON对象,因此不胜感激。

P.S。我知道使用所有这些内联选择都是有点麻烦的方法,但这更像是我逐步开发的开发版本,我认为它会起作用。

0 个答案:

没有答案