查找所有记录为' true'嵌套JSON中的值

时间:2016-11-17 06:51:47

标签: mysql json apache-drill

这是我的嵌套JSON:

{  
   "business_id":"pNQwnY_q4okdlnPiR-3RBA",
   "full_address":"6105 S Fort Apache Rd\nSpring Valley\nLas Vegas, NV 89148",
   "hours":{  },
   "open":true,
   "categories":[  ],
   "city":"Las Vegas",
   "review_count":68,
   "name":"Empire Bagels",
   "neighborhoods":[  
      "Spring Valley"
   ],
   "longitude":-115.298175926911,
   "state":"NV",
   "stars":3.0,
   "latitude":36.07728616051,
   "attributes":{  
      "Take-out":true,
      "Wi-Fi":"no",
      "Good For":{  
         "dessert":false,
         "latenight":false,
         "lunch":false,
         "dinner":false,
         "breakfast":true,
         "brunch":false
      },
      "Caters":true,
      "Noise Level":"quiet",
      "Takes Reservations":false,
      "Delivery":false,
      "Ambience":{  
         "romantic":false,
         "intimate":false,
         "classy":false,
         "hipster":false,
         "divey":false,
         "touristy":false,
         "trendy":false,
         "upscale":false,
         "casual":true
      },
      "Parking":{  
         "garage":false,
         "street":false,
         "validated":false,
         "lot":true,
         "valet":false
      },
      "Has TV":true,
      "Outdoor Seating":true,
      "Attire":"casual",
      "Alcohol":"none",
      "Waiter Service":false,
      "Accepts Credit Cards":true,
      "Good for Kids":true,
      "Good For Groups":true,
      "Price Range":1
   },
   "type":"business"
}

我正在查询这个使用apache练习。我想找出十大最常见的真实'一个城市中所有餐馆的属性。我希望它像:

Accepts Credit Cards : 200,
Alcohol: 300,
Good For Kids : 500

我的查询将如何?这就是我所做的:

select attributes, count(*) attributes from `yelp_dataset` group by attributes;

我收到此错误:

Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or repeated scalar type should not be used in group by, order by or in a comparison operator. Drill does not support compare between MAP:REQUIRED and MAP:REQUIRED.

Fragment 0:0

[Error Id: 8fe8a616-92c7-4da0-ab65-b5542d391f47 on 192.168.10.104:31010] (state=,code=0)

我的查询应该是什么?

1 个答案:

答案 0 :(得分:0)

由于混合数据类型,我无法使用KVGEN()自动展平属性,但您可能尝试使用一点UNION ALL强力CTE:

WITH ReviewAttributes AS (
    SELECT
      reviews.name,
      'Accepts Credit Cards' as `AttributeName`,
      CASE WHEN reviews.attributes.`Accepts Credit Cards` = true THEN 1 ELSE 0 END as `AttributeValue`
    FROM 
      `yelp_dataset` reviews
    UNION ALL
    SELECT
      reviews.name,
      'Alcohol' as `AttributeName`,
      CASE WHEN reviews.attributes.`Alcohol` <> 'none' THEN 1 ELSE 0 END as `AttributeValue`
    FROM
      `yelp_dataset` reviews
    UNION ALL
    SELECT
      reviews.name,
      'Good for Kids' as `AttributeName`,
      CASE WHEN reviews.attributes.`Good for Kids` = true THEN 1 ELSE 0 END as `AttributeValue`
    FROM
      `yelp_dataset` reviews
)
SELECT 
  `AttributeName`,
  SUM(`AttributeValue`) as `AttributeCount`
FROM
  ReviewAttributes
GROUP BY
  `AttributeName`;

CASE语句也可以帮助您解决布尔字段和枚举字段之间的一些差异,例如从样本中计算Alcohol