如何将Hive中的url查询字符串解析为多个键值对

时间:2013-10-02 17:56:59

标签: hadoop hive

我正在尝试运行一个hive查询,该查询将生成一个包含域,密钥,值和计数的表,按域/键/值的唯一组合进行分组。

数据示例:

http://www.aaa.com/path?key_a=5&key_b=hello&key_c=today&key_d=blue
http://www.aaa.com/path?key_a=5&key_b=goodb&key_c=yestr&key_d=blue
http://www.bbb.com/path?key_a=5&key_b=hello&key_c=today&key_d=blue
http://www.bbb.com/path?key_a=5&key_b=goodb&key_c=ystrd

期望的输出:

aaa.com | key_a | 5 | 2
aaa.com | key_b | hello | 1
aaa.com | key_b | goodb | 1
aaa.com | key_c | today | 1
aaa.com | key_c | yestr | 1
aaa.com | key_d | blue | 2
bbb.com | key_a | 5 | 2
bbb.com | key_b | hello | 1
bbb.com | key_b | goodb | 1
bbb.com | key_c | today | 1
bbb.com | key_c | ystrd | 1
bbb.com | key_d | blue | 1

以下是我一直在使用的内容:

"select parse_url(url,'HOST'), str_to_map(parse_url(url,'QUERY'),'&','='), count(1) from url_table group by select parse_url(url,'HOST'), str_to_map(parse_url(url,'QUERY'),'&','=') limit 10;"

我哪里错了?特别是我认为我搞砸的是:str_to_map(parse_url(url,'QUERY'),'&','=')因为我不知道如何将查询字符串拆分成多个键值对然后正确分组。

3 个答案:

答案 0 :(得分:2)

您可以在 Lateral View explode 的帮助下实现这一目标。

这应该有效:

hive> select parse_url(url,'HOST') as host, v.key as key, v.val,
count(*) as count from url u LATERAL VIEW
explode(str_to_map(parse_url(url,'QUERY'),'&','=')) v as key, val
group by parse_url(url, 'HOST'), v.key, v.val;

答案 1 :(得分:0)

我已验证下面的查询应该有效:

SELECT
  parse_url(url, 'HOST') AS host,
  q.key AS key,
  q.val AS val,
  COUNT(*)
FROM <your_table_with_url_as_a_field>
LATERAL VIEW explode(str_to_map(parse_url(url,'QUERY'),'&','=')) q AS key, val
WHERE parse_url(url,'QUERY') IS NOT NULL
GROUP BY parse_url(url, 'HOST'), q.key, q.val
ORDER BY host, key, val;

答案 2 :(得分:0)

解析URL元组

  1. src表包含completeurl
  2. 然后按主机,路径,查询应用分组

这样可以解决您的查询

SELECT 
  count(*), host, path, query
FROM ( 
  SELECT b.*
  FROM src 
  LATERAL VIEW parse_url_tuple(completeurl, 'HOST',
           'PATH', 'QUERY', 'QUERY:id') b as host, path, query, query_id
     )
GROUP BY host, path, query ;

有关https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-parse_url_tuple

的更多详细信息,请参见此处