使用Spark Cassandra连接器更新或插入流

时间:2018-12-11 13:37:21

标签: apache-spark cassandra spark-streaming spark-cassandra-connector spark-streaming-kafka

始终使用Spark Cassandra Connector将所有流数据很好地插入cassandra db中。虽然这不是期望的结果。

我想实现的是在employeetitle列匹配时在数据库中添加值。

以下是我到目前为止的情况

// Create direct kafka stream with brokers and topics
    JavaInputDStream<ConsumerRecord<String, Loan>> messages = KafkaUtils.createDirectStream(
            javaStreamingContext,
            LocationStrategies.PreferConsistent(),
            ConsumerStrategies.Subscribe(topicsSet, kafkaParams));
    JavaDStream<Loan> loanDStream = messages.map(record -> record.value());
    loanDStream.foreachRDD((loanJavaRDD, time) -> {
        System.out.println("Count "+loanJavaRDD.count());
    });
    JavaDStream<Loan> window = loanDStream.window(Durations.minutes(1), Durations.seconds(10));
    JavaPairDStream<String, BigDecimal> employeeTitleLoanPair = window.mapToPair(loan -> new Tuple2<>(loan.getEmployeeTitle(), loan.getLoanAmount())).reduceByKey((bigDecimal, bigDecimal2) -> bigDecimal.add(bigDecimal2));
    employeeTitleLoanPair.print();
    // Map Cassandra table column
    Map<String, String> columnNameMappings = new HashMap<String, String>();
    columnNameMappings.put("id", "id");
    columnNameMappings.put("employeeTitle", "employeetitle");
    columnNameMappings.put("totalLoan", "totalloan");

    employeeTitleLoanPair.foreachRDD((pairsRDD, time) -> {
        CassandraJavaUtil
                .javaFunctions(pairsRDD.map(pair -> new EmployeeLoan(UUID.randomUUID(), pair._1, pair._2)).filter(employeeLoan -> !employeeLoan.getEmployeeTitle().equals("")))
                .writerBuilder("loan_keyspace", "emp_title_loans", CassandraJavaUtil.mapToRow(EmployeeLoan.class, columnNameMappings))
                .saveToCassandra();
    });

    // Start the computation
    javaStreamingContext.start();
    javaStreamingContext.awaitTermination();

我如何检查列是否相等并更新或插入

0 个答案:

没有答案