应用错误收集

我有一个Hive表，我试图使用morphline索引到SolrCloud，但是，Hive表背后的数据是一个20GB的大文件，morphline需要很长时间才能处理。

不是运行多个映射器和缩减器，而是只能运行一个映射器，可能是因为我们只有一个文件。

yarn jar /opt/<path>/search-mr-1.0.0-cdh5.5.1-job.jar \
org.apache.solr.hadoop.MapReduceIndexerTool \
--morphline-file morphlines.conf \
--output-dir hdfs://<outputdir> \
--zk-host node1.datafireball.com:2181/solr \
--collection <collectionname> \
--input-list <filewherethedatais> \
--mappers 6

它仍然只开出一份工作......这是永远的，任何人都可以对此有所了解吗？

资源您可能会发现有用的信息：

Cloudera Mapreduce Batch Index into Solrcloud
Kitesdk which morphline belongs to.

Morphline阅读一个大文件

0 个答案: