卡夫卡消费者尝试使用Spark处理消息时多次消费消息

时间:2020-07-22 00:29:28

标签: scala apache-spark hadoop apache-kafka kafka-consumer-api

我有一个Kafka使用者,它从一个主题中读取消息,并使用spark将其写入到配置单元表中。当我在Yarn上运行代码时,它会多次读取相同的消息。我在该主题中有大约100,000条消息。但是,我的消费者会多次阅读相同的内容。当我进行一次非重复时,我得到了实际计数。

这是我编写的代码。我想知道我是否缺少任何设置。

 val spark = SparkSession.builder()
      .appName("Kafka Consumer")
      .enableHiveSupport()
      .getOrCreate()

    import spark.implicits._

    val kafkaConsumerProperty = new Properties()
    kafkaConsumerProperty.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "---")
    kafkaConsumerProperty.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer")
    kafkaConsumerProperty.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer")
    kafkaConsumerProperty.put(ConsumerConfig.GROUP_ID_CONFIG, "draw_attributes")
    kafkaConsumerProperty.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
    kafkaConsumerProperty.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true")
    val topic = "space_orchestrator"
    val kafkaConsumer = new KafkaConsumer[String,String](kafkaConsumerProperty)
    kafkaConsumer.subscribe(Collections.singletonList(topic))
    
    while(true){

      val recordSeq = kafkaConsumer.poll(10000).toSeq.map( x => x.value())
      if(!recordSeq.isEmpty)
        {
          val newDf = spark.read.json(recordSeq.toDS)
          newDf.write.mode(SaveMode.Overwrite).saveAsTable("dmart_dev.draw_attributes")
        }
    }

1 个答案:

答案 0 :(得分:1)

或者,尝试手动设置偏移量。为此,应禁用自动提交( public void search() { searchText = (EditText) findViewById(R.id.searchText); if (custNameList.size() != 0) { searchText.addTextChangedListener(new TextWatcher() { @Override public void beforeTextChanged(CharSequence charSequence, int i, int i1, int i2) { } @Override public void onTextChanged(CharSequence charSequence, int i, int i1, int i2) { mCustomAdapter.getFilter().filter(charSequence); } @Override public void afterTextChanged(Editable editable) { } }); } else { Toast.makeText(this, "No Entry", Toast.LENGTH_SHORT).show(); } } private class mCustomAdapter extends SimpleAdapter { public mCustomAdapter(Context context, List<? extends Map<String, ?>> data, int resource, String[] from, int[] to) { super(context, data, resource, from, to); } @Override public int getCount() { return custIdList.size(); } @Override public Object getItem(int position) { return custIdList.get(position); } @Override public long getItemId(int position) { return position; } @Override public View getView(int position, View convertView, ViewGroup parent) { if(convertView == null) { convertView= layoutInflater.inflate(R.layout.item_subitem, null); } TextView listCustID = convertView.findViewById(R.id.sampleCustId); TextView listCustName = convertView.findViewById(R.id.sampleCustName); Button sampleButton = convertView.findViewById(R.id.sampleButton); listCustID.setText(custIdList.get(position)); listCustName.setText(custNameList.get(position)); sampleButton.setOnClickListener(new View.OnClickListener() { @Override public void onClick(View v) { Log.i("ITEM ", custNameList.get(position)) ; } }); return convertView; } } )。对于手动提交,KafkaConsumers提供了两种方法,即enable.auto.commit = falsecommitSync()。顾名思义,commitSync()是一个阻塞调用,它在成功提交偏移量后才返回,而commitAsync()立即返回。