应用错误收集

All of the introductory tutorials and docs that I can find on Hadoop have simple/contrived (word count-style) examples, where each of them is submitted to MR by: SSHing into the JobTracker node Making sure that a JAR file containing the MR job is on HDFS Running an HDFS command of the form bin/hadoop jar share/hadoop/mapreduce/my-map-reduce.jar <someArgs> that actually runs Hadoop/MR Either reading the MR result from the command-line or opening a text file containing the result Although these examples are great for showing total newbies how to work with Hadoop, it doesn't show me how Java code actually integrates with Hadoop/MR at the API level. I guess I am sort of expecting that: Hadoop exposes some kind of client access/API for submitting MR jobs to the cluster Once the jobs are complete, some asynchronous mechanism (callback, listener, etc.) reports the result back to the client So, something like this (Groovy pseudo-code): class Driver { static void main(String[] args) { new Driver().run(args) } void run(String[] args) { MapReduceJob myBigDataComputation = new SolveTheMeaningOfLifeJob(convertToHadoopInputs(args), new MapReduceCallback() { @Override void onResult() { // Now that you know the meaning of life, do nothing. } }) HadoopClusterClient hadoopClient = new HadoopClusterClient("http://my-hadoop.example.com/jobtracker") hadoopClient.submit(myBigDataComputation) } } So I ask: Surely the simple examples in all the introductory tutorials, where you SSH into nodes and run Hadoop from the CLI, and open text files to view its results...surely that can't be the way Big Data companies actually integrate with Hadoop. Surely, something along the lines of my pseudo-code snippet above is used to kick off an MR job and fetch its results. What is it?

How does Hadoop actually accept MR jobs and input data?

1 个答案: