将火花串流结果写入HDFS或本地文件系统的问题

时间:2017-11-26 14:19:41

标签: apache-spark hdfs spark-streaming

使用Java API,我编写了一个火花流应用程序,可正确处理和打印结果,现在我想将结果写入HDFS。版本如下:

  1. Hadoop 2.7.3
  2. Spark 2.2.0
  3. Java 1.8
  4. 以下是代码:

    import java.util.*;
    import org.apache.spark.SparkConf;
    import org.apache.spark.streaming.Duration;
    import org.apache.spark.streaming.api.java.*;
    import org.apache.spark.streaming.kafka010.*;
    import org.apache.kafka.clients.consumer.ConsumerRecord;
    import org.apache.kafka.common.serialization.StringDeserializer;
    import org.apache.kafka.common.serialization.ByteArrayDeserializer;
    
    public class Spark {
        public static void main(String[] args) throws InterruptedException {
            SparkConf conf = new SparkConf().setAppName("Spark Streaming").setMaster("local[*]");
            JavaStreamingContext ssc = new JavaStreamingContext(conf, new Duration(1000));
            Map<String, Object> kafkaParams = new HashMap<>();
            kafkaParams.put("bootstrap.servers", "kafka1:9092,kafka2:9092");
            kafkaParams.put("key.deserializer", StringDeserializer.class);
            kafkaParams.put("value.deserializer", ByteArrayDeserializer.class);
            kafkaParams.put("group.id", "use");
            kafkaParams.put("auto.offset.reset", "earliest");
            kafkaParams.put("enable.auto.commit", false);
    
            Collection<String> topics = Arrays.asList("testStr");
    
            JavaInputDStream<ConsumerRecord<String, byte[]>> stream =
                    KafkaUtils.createDirectStream(
                            ssc,
                            LocationStrategies.PreferConsistent(),
                            ConsumerStrategies.<String, byte[]>Subscribe(topics, kafkaParams)
                    );
            stream.map(record -> finall(record.value())).map(record -> Arrays.deepToString(record)).dstream().saveAsTextFiles(
                    "spark", "txt"
            );
    
            ssc.start();
            ssc.awaitTermination();
    
        }
    
        public static String[][] finall(byte[] record){
    
            String[][] result = new String[4][];
            result[0] = javaTest.bytePrintable(record);
            result[1] = javaTest.hexTodecimal(record);
            result[2] = javaTest.hexToOctal(record);
            result[3] = javaTest.hexTobin(record);
    
                return result;
            }
        }
    

    但HDFS和本地文件系统都没有错误:

    ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3) java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics()Lorg/apache/hadoop/fs/FileSystem$Statistics$StatisticsData;

    问题是什么?是否需要从Hadoop导入一些库?

    更新

    我使用本地火花罐而不是Maven依赖,而且它有效。所以依赖中的某些东西是错误的。以下是POM.xml文件:

    <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>2.2.0</version>
    </dependency>
    <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.2.0</version>
    </dependency>
    <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>2.2.0</version>
    </dependency>
    

    哪一个不兼容?或者可能缺少某些东西!

0 个答案:

没有答案