Question

我正在执行以下配置单元查询：

SELECT t.retweeted_screen_name, sum(retweets) AS total_retweets, count(*) AS tweet_count 
FROM (SELECT retweeted_status.user.screen_name as retweeted_screen_name, retweeted_status.text, max(retweet_count) as retweets 
     FROM mytweets GROUP BY retweeted_status.user.screen_name, retweeted_status.text) t 

GROUP BY t.retweeted_screen_name ORDER BY total_retweets DESC LIMIT 10;

日志：

Query ID = root_20161114205033_e1736dca-0999-431a-b301-4d1a3bfbaa00
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.Driver - Query ID = root_20161114205033_e1736dca-0999-431a-b301-4d1a3bfbaa00
Total jobs = 2
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO  org.apache.hadoop.hive.ql.Driver - Total jobs = 2
Launching Job 1 out of 2
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO  org.apache.hadoop.hive.ql.Driver - Launching Job 1 out of 2
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO  org.apache.hadoop.hive.ql.Driver - Starting task [Stage-1:MAPRED] in serial mode
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.exec.Utilities - Cache Content Summary for hdfs://localhost:9000/user/flume/tweets length: 1858 file count: 1 directory  count: 1
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO  org.apache.hadoop.hive.ql.exec.Utilities - BytesPerReducer=256000000  maxReducers=1009 totalInputFileSize=1858
Number of reduce tasks not specified. Estimated from input data size: 1
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.exec.Task - Number of reduce tasks not specified.    Estimated from input data size: 1

In order to change the average load for a reducer (in bytes):
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO  org.apache.hadoop.hive.ql.exec.Task - In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO   org.apache.hadoop.hive.ql.exec.Task -   set    hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.exec.Task - In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.exec.Task -   set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.exec.Task - In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.exec.Task -   set mapreduce.job.reduces=<number>
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO hive.ql.Context - New scratch dir is hdfs://localhost:9000/tmp/hive/root/5bd85c7b-bb35-4557-9053-1b7d248538a3/hive_2016-11-14_20-50-33_177_2178154594719247302-1
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO  org.apache.hadoop.hive.ql.exec.mr.ExecDriver - Using  org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.exec.mr.ExecDriver - adding libjars:  hdfs://localhost:9000/usr/lib/json-serde-1.3.6-SNAPSH‌OT-jar-‌‌with-depe‌ndencies.‌ja‌r
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.exec.Utilities - Processing alias t:mytweets
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.exec.Utilities - Adding input file  hdfs://localhost:9000/user/flume/tweets
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.exec.Utilities - Content Summary  hdfs://localhost:9000/user/flume/tweetslength: 1858 num files: 1 num directories: 1
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO hive.ql.Context - New scratch dir is hdfs://localhost:9000/tmp/hive/root/5bd85c7b-bb35-4557-9053-1b7d248538a3/hive_2016-11-14_20-50-33_177_2178154594719247302-1
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.exec.SerializationUtilities - Serializing MapWork using kryo
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.exec.SerializationUtilities - Serializing ReduceWork using kryo
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.exec.Utilities - PLAN PATH = hdfs://localhost:9000/tmp/hive/root/5bd85c7b-bb35-4557-9053-1b7d248538a3/hive_2016-11-14_20-50-33_177_2178154594719247302-1/-mr-10006/b726734e-92c6-42bf-abb0-5853ae53bf3d/map.xml
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.exec.Utilities - PLAN PATH = hdfs://localhost:9000/tmp/hive/root/5bd85c7b-bb35-4557-9053-1b7d248538a3/hive_2016-11-14_20-50-33_177_2178154594719247302-1/-mr-10006/b726734e-92c6-42bf-abb0-5853ae53bf3d/reduce.xml
java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/usr/lib/json-serde-1.3.6-SNAPSH‌OT-jar-‌‌with-depe‌ndencies.‌ja‌r
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:179)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:98)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:193)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at  org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:433)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:138)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://localhost:9000/usr/lib/json-serde-1.3.6-SNAPSH‌OT-jar-‌‌with-depe‌ndencies.‌ja‌r)'
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] ERROR org.apache.hadoop.hive.ql.exec.Task - Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://localhost:9000/usr/lib/json-serde-1.3.6-SNAPSH‌OT-jar-‌‌with-depe‌ndencies.‌ja‌r)'
java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/usr/lib/json-serde-1.3.6-SNAPSH‌OT-jar-‌‌with-depe‌ndencies.‌ja‌r
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:179)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:98)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:193)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:433)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:138)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. File does not exist: hdfs://localhost:9000/usr/lib/json-serde-1.3.6-SNAPSH‌OT-jar-‌‌with-depe‌ndencies.‌ja‌r
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] ERROR org.apache.hadoop.hive.ql.Driver - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. File does not exist: hdfs://localhost:9000/usr/lib/json-serde-1.3.6-SNAPSH‌OT-jar-‌‌with-depe‌ndencies.‌ja‌r
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.Driver - Completed executing command(queryId=root_20161114205033_e1736dca-0999-431a-b301-4d1a3bfbaa00); Time taken: 8.332 seconds
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.conf.HiveConf - Using the default value passed in for log id: 5bd85c7b-bb35-4557-9053-1b7d248538a3
[5bd85c7b-bb35-4557-9053-1b7d248538a3 main] INFO org.apache.hadoop.hive.ql.session.SessionState - Resetting thread name to  main

请告诉我如何解决它。

Answer 1

1）在hadoop conf dir下的hadoop-env.sh

添加export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:ABSOLUTE_PATH_TO_slf4j-simple-1.7.5.jar_JAR

或2）在hadoop lib路径中添加slf4j-simple-1.7.5.jar

执行twitter情绪分析查询后的错误

1 个答案: