我想将MongoDB与Hadoop结合使用。我找到的是Mongo-Hadoop Connector。但是我找不到关于这个例子的完整文档。
mongo-hadoop/examples/sensors
,build
,run_job.sh
,src
,testdata_generator.js
分别有四个文件。我使用testdata_generator.js
将数据导入MongoDB,dbs为demo
。当我尝试运行run_job.sh
时,有一个例外:
MongoDB shell version: 2.6.1
connecting to: demo
false
Exception in thread "main" java.lang.ClassNotFoundException: -D
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:249)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
run_job.sh
#!/bin/sh
mongo demo --eval "db.logs_aggregate.drop()"
#Set your HADOOP_HOME directory here.
#export HADOOP_HOME="/Users/mike/hadoop/hadoop-2.0.0-cdh4.3.0"
export HADOOP_HOME="/home/hduser/hadoop"
#FIRST PASS - map all the devices into an output collection
declare -a job1_args
job1_args=("jar" "`pwd`/build/libs/sensors-1.2.1-SNAPSHOT-hadoop_2.2.jar")
#job1_args=(${job1_args[@]} "com.mongodb.hadoop.examples.sensors.Devices")
job1_args=(${job1_args[@]} "-D" "mongo.job.input.format=com.mongodb.hadoop.MongoInputFormat")
job1_args=(${job1_args[@]} "-D" "mongo.input.uri=mongodb://localhost:27017/demo.devices")
job1_args=(${job1_args[@]} "-D" "mongo.job.mapper=com.mongodb.hadoop.examples.sensors.DeviceMapper")
job1_args=(${job1_args[@]} "-D" "mongo.job.reducer=com.mongodb.hadoop.examples.sensors.DeviceReducer")
job1_args=(${job1_args[@]} "-D" "mongo.job.output.key=org.apache.hadoop.io.Text")
job1_args=(${job1_args[@]} "-D" "mongo.job.output.value=org.apache.hadoop.io.Text")
job1_args=(${job1_args[@]} "-D" "mongo.output.uri=mongodb://localhost:27017/demo.logs_aggregate")
job1_args=(${job1_args[@]} "-D" "mongo.job.output.format=com.mongodb.hadoop.MongoOutputFormat")
$HADOOP_HOME/bin/hadoop "${job1_args[@]}" "$1"
我可以在我的计算机上运行基本的Map / Reduce示例,但这个问题困扰了我很多天......
新编辑的内容:
我可以通过以下步骤运行此示例:
Devices.java
,DeviceMapper.java
,DeviceReducer.java
,
和SensorDataGenerator.java
到.class;该命令为 javac -classpath [library
files] -d [folders] Devices.java DeviceMapper.java
DeviceReducer.java SensorDataGenerator.java
jar -cvf [jar file name] -C [path]
jar [jar file name] [class name]
但我不知道为什么我无法成功执行run_job.sh
。
Devices.java,这是这个例子中的主要java文件:
public class Devices extends MongoTool {
public Devices() throws UnknownHostException {
Configuration conf = new Configuration();
MongoConfig config = new MongoConfig(conf);
setConf(conf);
config.setInputFormat(MongoInputFormat.class);
config.setInputURI("mongodb://localhost:27017/demo.devices");
config.setOutputFormat(MongoOutputFormat.class);
config.setOutputURI("mongodb://localhost:27017/demo.logs_aggregate");
config.setMapper(DeviceMapper.class);
config.setReducer(DeviceReducer.class);
config.setMapperOutputKey(Text.class);
config.setMapperOutputValue(Text.class);
config.setOutputKey(IntWritable.class);
config.setOutputValue(BSONWritable.class);
new SensorDataGenerator().run();
}
public static void main(final String[] pArgs) throws Exception {
System.exit(ToolRunner.run(new Devices(), pArgs));
}
}
答案 0 :(得分:0)
使用gradle运行它。那些bash脚本有点过时,应该删除:
./ gradlew sensorData