我正在使用Hadoop-Vertica Connector将大文件导入Vertica。我试图在没有Reducer的情况下使用hadoop来做到这一点。但是在映射过程中,vertica输出表似乎无法初始化,始终存在错误。
当我检查文档时,它没有说我们可以在映射期间写入Vertica,所以我想知道我们是否可以这样做?
谢谢!
修改
这是Hadoop Vertica Connector的文件。
错误:
java.io.IOException: Cannot set record by name if names not initialized
at com.vertica.hadoop.VerticaRecord.set(VerticaRecord.java:270)
at com.vertica.hadoop.VerticaWordCount$TokenizerMapper.map(VerticaWordCount.java:92)
at com.vertica.hadoop.VerticaWordCount$TokenizerMapper.map(VerticaWordCount.java:60)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doA
检查VerticaWordCount.java的源代码,我发现输出表的名称列表根本没有初始化。
这是我在run()中的配置:
Job job = new Job(conf, "vertica hadoop");
conf = job.getConfiguration();
conf.set("mapreduce.job.tracker", "local");
//job.setInputFormatClass(VerticaInputFormat.class);
//You have to set the MapOutputKeyClass and MapOutputValueClass,
//since by default it will be the same as the class of Reducer's
//Output Key and Value
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(VerticaRecord.class);
/*************Settings for Vertica output************************/
//Set the output format of Reduce class.
//I will output VerticaRecords that will be stored in the database
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(VerticaRecord.class);
//Tell Hadoop to send its output to the Vertica
job.setOutputFormatClass(VerticaOutputFormat.class);
/****************************************************************/
job.setJarByClass(VerticaWordCount.class);
job.setMapperClass(TokenizerMapper.class);
FileInputFormat.addInputPath(job, new Path("/user/tmp/input"));
/******************************************************************/
//Defining the output table
//VerticaOutputFormat.setOutput(jobObject, tableName, [truncate, ["columnName1 dataType1" [,"columnNamen dataTypen" ...]] );
VerticaOutputFormat.setOutput(job, "target", true, "a int", "b varchar", "c varchar");