How to find frequency of a variable over multiple columns in hive?

时间:2016-08-31 18:09:35

标签: hadoop hive

I have data regarding gender of people under 8 columns:

mem1;mem2;mem3;mem4;mem5;mem6;mem7;mem8
MALE;FMALE;UNKN;MALE;FMALE;FMALE;MALE;MALE

Now I want to find out the frequency of male, fmale, unkn using hive. Something like

MALE 4
FMALE 3
UNKN 1

I'm new to Hive but I know we need to use group by. Can someone please help me with the query?

1 个答案:

答案 0 :(得分:0)

使用Hive Reflect获取计数。

  1. 创建整行作为一列的表

  2. 使用Hive Reflection计算列上的出现次数。实施例

  3.   

    选择反映(" org.apache.commons.lang.StringUtils"," countMatches",   " MALE; FMALE; UNKN; MALE; FMALE; FMALE;男性;男性","男性")作为男性,   反映(" org.apache.commons.lang.StringUtils"," countMatches",   " MALE; FMALE; UNKN; MALE; FMALE; FMALE;男性;男性","女性")女性   来自mytable