我正在用elasticsearch-hadoop敲打我的脑袋。我有这个简单的级联工作:
public class Main {
public static void main(String[] args) {
String inPath = args[ 0 ];
String outPath = args[ 1 ];
try {
Properties properties = new Properties();
AppProps.setApplicationJarClass( properties, Main.class );
HadoopFlowConnector flowConnector = new HadoopFlowConnector( properties );
Tap inTap = new Hfs( new TextDelimited( new Fields("url", "title") ), inPath );
Tap outTap = new EsTap(outPath, new Fields("url", "title"));
Pipe copyPipe = new Pipe( "copy" );
FlowDef flowDef = FlowDef.flowDef()
.addSource( copyPipe, inTap )
.addTailSink( copyPipe, outTap );
flowConnector.connect( flowDef ).complete();
} catch (Exception e) {
System.err.println("Exception running job: " + e.getMessage());
e.printStackTrace(System.err);
System.exit(-1);
}
}
}
这应该从HDFS读取并写入ES。使用此回溯运行作业停止:
Exception running job: unhandled exception
cascading.flow.FlowException: unhandled exception
Caused by: java.net.UnknownHostException: BD_C02HD0BPDV11: BD_C02HD0BPDV11
其中BD_C02HD0BPDV11
是我的机器的名称,这应该足够了,因为我有一个本地ES安装。无论如何,我很傻,我尝试添加一个属性来指向我的远程ES实例:
Properties properties = new Properties();
properties.setProperty("es.nodes","yada-yada.com");
properties.setProperty("es.port","9300");
但同样的错误信息也会上升(同样的主机名也是如此)。
所以我迷路了。这是一个配置错误,我错过了一些明显的东西,或者我缺乏一个核心概念?