尝试掌握Hive,我将人口普查数据(来自在美国工作的不同国家/地区的人的收入数据)上传到S3存储桶中。
能够运行其他查询,但无法按照简单查询运行。
我试图列出收入水平> 5万美元的来自不同国家的人。
我在hive中创建了表并从AWS S3存储桶导入数据,此处的收入列定义为字符串,此列的可能值为“< = 50K”和“> 50K”
以下查询结果为空结果集。这可能是什么问题?这个SQL语句在普通的MySQL控制台上运行良好。 为什么它没有在HIVE中显示预期的结果集?
hive> select country, income from census_income_data where income = '>50K';
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201312281227_0011, Tracking URL = http://ip-172-31-44-80.us-west-2.compute.internal:9100/jobdetails.jsp?jobid=job_201312281227_0011
Kill Command = /home/hadoop/bin/hadoop job -kill job_201312281227_0011
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-12-28 13:21:05,086 Stage-1 map = 0%, reduce = 0%
2013-12-28 13:21:26,279 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:27,289 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:28,299 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:29,310 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:30,321 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:31,334 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 7.74 sec
2013-12-28 13:21:32,369 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.74 sec
MapReduce Total cumulative CPU time: 7 seconds 740 msec
Ended Job = job_201312281227_0011
Counters:
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 7.74 sec HDFS Read: 219 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 740 msec
OK
Time taken: 56.559 seconds
以下是上述代码中使用的数据集的示例数据
30, State-gov, 141297, Bachelors, 13, Married-civ-spouse, Prof-specialty, Husband, Asian-Pac-Islander, Male, 0, 0, 40, India, >50K
23, Private, 122272, Bachelors, 13, Never-married, Adm-clerical, Own-child, White, Female, 0, 0, 30, United-States, <=50K
32, Private, 205019, Assoc-acdm, 12, Never-married, Sales, Not-in-family, Black, Male, 0, 0, 50, United-States, <=50K
40, Private, 121772, Assoc-voc, 11, Married-civ-spouse, Craft-repair, Husband, Asian-Pac-Islander, Male, 0, 0, 40, ?, >50K
34, Private, 245487, 7th-8th, 4, Married-civ-spouse, Transport-moving, Husband, Amer-Indian-Eskimo, Male, 0, 0, 45, Mexico, <=50K
25, Self-emp-not-inc, 176756, HS-grad, 9, Never-married, Farming-fishing, Own-child, White, Male, 0, 0, 35, United-States, <=50K
32, Private, 186824, HS-grad, 9, Never-married, Machine-op-inspct, Unmarried, White, Male, 0, 0, 40, United-States, <=50K
38, Private, 28887, 11th, 7, Married-civ-spouse, Sales, Husband, White, Male, 0, 0, 50, United-States, <=50K
43, Self-emp-not-inc, 292175, Masters, 14, Divorced, Exec-managerial, Unmarried, White, Female, 0, 0, 45, United-States, >50K
40, Private, 193524, Doctorate, 16, Married-civ-spouse, Prof-specialty, Husband, White, Male, 0, 0, 60, United-States, >50K
54, Private, 302146, HS-grad, 9, Separated, Other-service, Unmarried, Black, Female, 0, 0, 20, United-States, <=50K
35, Federal-gov, 76845, 9th, 5, Married-civ-spouse, Farming-fishing, Husband, Black, Male, 0, 0, 40, United-States, <=50K
43, Private, 117037, 11th, 7, Married-civ-spouse, Transport-moving, Husband, White, Male, 0, 2042, 40, United-States, <=50K
59, Private, 109015, HS-grad, 9, Divorced, Tech-support, Unmarried, White, Female, 0, 0, 40, United-States, <=50K
56, Local-gov, 216851, Bachelors, 13, Married-civ-spouse, Tech-support, Husband, White, Male, 0, 0, 40, United-States, >50K
19, Private, 168294, HS-grad, 9, Never-married, Craft-repair, Own-child, White, Male, 0, 0, 40, United-States, <=50K
54, ?, 180211, Some-college, 10, Married-civ-spouse, ?, Husband, Asian-Pac-Islander, Male, 0, 0, 60, South, >50K
39, Private, 367260, HS-grad, 9, Divorced, Exec-managerial, Not-in-family, White, Male, 0, 0, 80, United-States, <=50K
49, Private, 193366, HS-grad, 9, Married-civ-spouse, Craft-repair, Husband, White, Male, 0, 0, 40, United-States, <=50K
23, Local-gov, 190709, Assoc-acdm, 12, Never-married, Protective-serv, Not-in-family, White, Male, 0, 0, 52, United-States, <=50K
答案 0 :(得分:0)
您的SQL代码
select country, income from census_income_data where income = '>50K';
使用'='运算符来比较两个字符串。据我所知,运算符考虑了字符集,周围的空格等。也许你会对“LIKE”运算符有更多的好运。
select country, income from census_income_data where income LIKE ">50K";
答案 1 :(得分:0)
首先在您的表格上运行select * from table limit 20
,以验证期望列中是否存在预期值
现在可能有其他字符,如空格,可能导致查询返回0结果
请尝试以下方法:
select country, income from census_income_data where income like '%50%';
如果它不起作用,那么您可能在创建表时放错了数据
如果有效,请尝试:
select country, income from census_income_data where income like '%>50K%';
如果它有效,你可能在该字段中有其他字符,尝试运行:
select concat('INCOME:',income,'.') from census_income_data where income like '%>50K%';
并查看你是否完全得到了这个字符串INCOME:>50K.
。