我使用mapreduce插入elasticsearch。
下面是mycode:
public class CreditBreauDriver extends Configured implements Tool
{
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new CreditBreauDriver(), args);
System.exit(exitCode);
}
Logger logger = LoggerFactory.getLogger(CreditBreauDriver.class);
public int run(String[] args) throws Exception
{
logger.debug("Entering MRInputDriver.run()");
if (args.length != 2)
{
System.err.printf("Usage: %s [generic options] <input> <output>\n",getClass().getSimpleName());
ToolRunner.printGenericCommandUsage(System.err);
return -1;
}
Job job = new Job();
job.setJarByClass(CreditBreauDriver.class);
job.setJobName("Elastic-Test");
logger.info("Input path " + args[0]);
FileInputFormat.addInputPath(job, new Path(args[0]));
Configuration conf = job.getConfiguration();
conf.set("es.nodes","http://192.168.63.128:9200");
conf.set("es.resource","es/credit");
//conf.set("es.mapping.id", "_id");
job.setMapperClass(CreditBureauMapper.class);
job.setOutputFormatClass(EsOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(MapWritable.class);
int returnValue = job.waitForCompletion(true) ? 0:1;
System.out.println("job.isSuccessful " + job.isSuccessful());
logger.debug("Exiting MRInputDriver.run()");
return returnValue;
}
}
这是我的映射程序:
public class CreditBureauMapper extends Mapper<Object, Text, Text, MapWritable> {
Logger logger = LoggerFactory.getLogger(CreditBureauMapper.class);
protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
logger.debug("Entering WordCountMapper.map() " + this);
String line = value.toString();
String[] splittedLine = line.split(",");
MapWritable mapWritable = new MapWritable();
//mapWritable.put(new Text("_id"), new Text(splittedLine[0]));
//mapWritable.put(new Text(splittedLine[0]), new Text(splittedLine[1]+","+splittedLine[2]+","+splittedLine[3]));
mapWritable.put(new Text("doc_id"), new Text(splittedLine[0]));
mapWritable.put(new Text("content"), new Text(splittedLine[1]+","+splittedLine[2]+","+splittedLine[3]));
context.write(value, mapWritable);
logger.debug("Exiting WordCountMapper"+value);
}
}
我的输入数据大约有100行,但只插入了10行
注意:数据不同。
我需要更改一些属性吗?这是我本地VM中的单节点elasticsearch
答案 0 :(得分:2)
Elasticsearch只会通过size参数为您提供大量的结果,如果您没有设置它,则默认为10.它会告诉您在hits.totalhits中匹配您的查询的文档总数。< / p>
如果您知道自己只有475个文档,那么您可以将大小设置为500,然后全部获取。当然,如果你把你设定的尺寸超出你的尺寸,这将停止工作,如果你期望获得许多文件,可能是几千,这将变得非常不切实际。
获取所有文档的最佳方法是使用扫描和滚动搜索批量获取它们,尽管这更复杂一些。 elasticsearch的文档This page解释了如何通过CURL来完成。