Question

我正在努力找到让hadoop更快的技术。那里有任何内存的地图。开源技术＆＃39;像gridgain？对于gridgain，我只能下载评估版。

Answer 1

您可能正在寻找Apache Spark。

To run programs faster, Spark offers a general execution model 
that can optimize arbitrary operator graphs, and supports in-memory 
computing, which lets it query data faster than disk-based engines like Hadoop.

虽然它与代码有点不同，因为它主要是为Scala设计的。因此，您不再编写map和reduce函数，而是以声明方式构建计算块 - 因此Spark比MapReduce更灵活。

让我们来看看WordCount，Java版本看起来有点冗长：

 JavaPairRDD<String, Integer> ones = words.map(new PairFunction<String, String, Integer>() {
      public Tuple2<String, Integer> call(String s) {
        return new Tuple2<String, Integer>(s, 1);
      }
    });

    JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Function2<Integer, Integer, Integer>() {
      public Integer call(Integer i1, Integer i2) {
        return i1 + i2;
      }
    });

使用Java 8功能可能会更好。

在Scala中，它更加紧凑：

val file = spark.textFile("hdfs://...")
val counts = file.flatMap(line => line.split(" "))
                  .map(word => (word, 1))
                  .reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")

在开源中是否有任何内存MapReduce技术

1 个答案: