在elasticsearch中只插入了10个文档

时间:2016-05-16 11:04:50

标签: hadoop elasticsearch mapreduce

我使用mapreduce插入elasticsearch。

下面是mycode:

public class CreditBreauDriver extends Configured implements Tool
{

    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(new CreditBreauDriver(), args);
        System.exit(exitCode);
    }

    Logger logger = LoggerFactory.getLogger(CreditBreauDriver.class);

        public int run(String[] args) throws Exception 
        {
        logger.debug("Entering MRInputDriver.run()");
        if (args.length != 2) 
        {
            System.err.printf("Usage: %s [generic options] <input> <output>\n",getClass().getSimpleName());
            ToolRunner.printGenericCommandUsage(System.err);
            return -1;
        }
        Job job = new Job();
        job.setJarByClass(CreditBreauDriver.class);
        job.setJobName("Elastic-Test");     
        logger.info("Input path " + args[0]);        
        FileInputFormat.addInputPath(job, new Path(args[0]));        
        Configuration conf = job.getConfiguration();
        conf.set("es.nodes","http://192.168.63.128:9200");
        conf.set("es.resource","es/credit");
        //conf.set("es.mapping.id", "_id");          
        job.setMapperClass(CreditBureauMapper.class);
        job.setOutputFormatClass(EsOutputFormat.class); 
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(MapWritable.class);
        int returnValue = job.waitForCompletion(true) ? 0:1;
        System.out.println("job.isSuccessful " + job.isSuccessful());
        logger.debug("Exiting MRInputDriver.run()");
        return returnValue;
        }
}

这是我的映射程序:

public class CreditBureauMapper extends Mapper<Object, Text, Text, MapWritable> {

    Logger logger = LoggerFactory.getLogger(CreditBureauMapper.class);

    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {

        logger.debug("Entering WordCountMapper.map() " + this);
        String line = value.toString();
        String[] splittedLine = line.split(",");

        MapWritable mapWritable = new MapWritable();
        //mapWritable.put(new Text("_id"), new Text(splittedLine[0]));
        //mapWritable.put(new Text(splittedLine[0]), new Text(splittedLine[1]+","+splittedLine[2]+","+splittedLine[3]));
        mapWritable.put(new Text("doc_id"), new Text(splittedLine[0]));
        mapWritable.put(new Text("content"), new Text(splittedLine[1]+","+splittedLine[2]+","+splittedLine[3]));
        context.write(value, mapWritable);
        logger.debug("Exiting WordCountMapper"+value);
    }

}  

我的输入数据大约有100行,但只插入了10行 注意:数据不同。
我需要更改一些属性吗?这是我本地VM中的单节点elasticsearch

1 个答案:

答案 0 :(得分:2)

Elasticsearch只会通过size参数为您提供大量的结果,如果您没有设置它,则默认为10.它会告诉您在hits.totalhits中匹配您的查询的文档总数。< / p>

如果您知道自己只有475个文档,那么您可以将大小设置为500,然后全部获取。当然,如果你把你设定的尺寸超出你的尺寸,这将停止工作,如果你期望获得许多文件,可能是几千,这将变得非常不切实际。

获取所有文档的最佳方法是使用扫描和滚动搜索批量获取它们,尽管这更复杂一些。 elasticsearch的文档This page解释了如何通过CURL来完成。