我正在努力找到让hadoop更快的技术。那里有任何内存的地图。开源技术'像gridgain?对于gridgain,我只能下载评估版。
答案 0 :(得分:1)
您可能正在寻找Apache Spark。
To run programs faster, Spark offers a general execution model
that can optimize arbitrary operator graphs, and supports in-memory
computing, which lets it query data faster than disk-based engines like Hadoop.
虽然它与代码有点不同,因为它主要是为Scala
设计的。因此,您不再编写map
和reduce
函数,而是以声明方式构建计算块 - 因此Spark
比MapReduce
更灵活。
让我们来看看WordCount,Java版本看起来有点冗长:
JavaPairRDD<String, Integer> ones = words.map(new PairFunction<String, String, Integer>() {
public Tuple2<String, Integer> call(String s) {
return new Tuple2<String, Integer>(s, 1);
}
});
JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Function2<Integer, Integer, Integer>() {
public Integer call(Integer i1, Integer i2) {
return i1 + i2;
}
});
使用Java 8功能可能会更好。
在Scala
中,它更加紧凑:
val file = spark.textFile("hdfs://...")
val counts = file.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")