I have defined a Hive table where a single column contains JSON text:
CREATE EXTERNAL TABLE IF NOT EXISTS my.rawdata (
json string
)
PARTITIONED BY (dt string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = '\n',
'quoteChar' = '\0',
'escapeChar' = '\r'
)
STORED AS TEXTFILE
LOCATION 's3://mydata/';
Is there a Presto/Athena query that can list out all field names that occur within the JSON and their frequency (i.e. total number of times the attribute appears in the table)?
答案 0 :(得分:2)
Use the JSON functions to parse the JSON and turn it into a map. Then extract the keys and unnest them. Finally, use a normal SQL aggregation:
SELECT key, count(*)
FROM (
SELECT map_keys(cast(json_parse(json) AS map(varchar, json))) AS keys
FROM rawdata
)
CROSS JOIN UNNEST (keys) AS t (key)
GROUP BY key
答案 1 :(得分:0)
select key
,count(*)
from t cross join
unnest (regexp_extract_all(json,'"([^"]+)"\s*:\s*("[^"]+"|[^,{}]+)',1)) u (key)
group by key
;