我有一个Kafka使用者,它从一个主题中读取消息,并使用spark将其写入到配置单元表中。当我在Yarn上运行代码时,它会多次读取相同的消息。我在该主题中有大约100,000条消息。但是,我的消费者会多次阅读相同的内容。当我进行一次非重复时,我得到了实际计数。
这是我编写的代码。我想知道我是否缺少任何设置。
val spark = SparkSession.builder()
.appName("Kafka Consumer")
.enableHiveSupport()
.getOrCreate()
import spark.implicits._
val kafkaConsumerProperty = new Properties()
kafkaConsumerProperty.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "---")
kafkaConsumerProperty.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer")
kafkaConsumerProperty.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer")
kafkaConsumerProperty.put(ConsumerConfig.GROUP_ID_CONFIG, "draw_attributes")
kafkaConsumerProperty.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
kafkaConsumerProperty.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true")
val topic = "space_orchestrator"
val kafkaConsumer = new KafkaConsumer[String,String](kafkaConsumerProperty)
kafkaConsumer.subscribe(Collections.singletonList(topic))
while(true){
val recordSeq = kafkaConsumer.poll(10000).toSeq.map( x => x.value())
if(!recordSeq.isEmpty)
{
val newDf = spark.read.json(recordSeq.toDS)
newDf.write.mode(SaveMode.Overwrite).saveAsTable("dmart_dev.draw_attributes")
}
}
答案 0 :(得分:1)
或者,尝试手动设置偏移量。为此,应禁用自动提交( public void search()
{
searchText = (EditText) findViewById(R.id.searchText);
if (custNameList.size() != 0)
{
searchText.addTextChangedListener(new TextWatcher()
{
@Override
public void beforeTextChanged(CharSequence charSequence, int i, int i1, int i2) {
}
@Override
public void onTextChanged(CharSequence charSequence, int i, int i1, int i2)
{
mCustomAdapter.getFilter().filter(charSequence);
}
@Override
public void afterTextChanged(Editable editable)
{
}
});
}
else
{
Toast.makeText(this, "No Entry", Toast.LENGTH_SHORT).show();
}
}
private class mCustomAdapter extends SimpleAdapter
{
public mCustomAdapter(Context context, List<? extends Map<String, ?>> data, int resource, String[] from, int[] to)
{
super(context, data, resource, from, to);
}
@Override
public int getCount()
{
return custIdList.size();
}
@Override
public Object getItem(int position)
{
return custIdList.get(position);
}
@Override
public long getItemId(int position)
{
return position;
}
@Override
public View getView(int position, View convertView, ViewGroup parent)
{
if(convertView == null)
{
convertView= layoutInflater.inflate(R.layout.item_subitem, null);
}
TextView listCustID = convertView.findViewById(R.id.sampleCustId);
TextView listCustName = convertView.findViewById(R.id.sampleCustName);
Button sampleButton = convertView.findViewById(R.id.sampleButton);
listCustID.setText(custIdList.get(position));
listCustName.setText(custNameList.get(position));
sampleButton.setOnClickListener(new View.OnClickListener()
{
@Override
public void onClick(View v)
{
Log.i("ITEM ", custNameList.get(position)) ;
}
});
return convertView;
}
}
)。对于手动提交,KafkaConsumers提供了两种方法,即enable.auto.commit = false
和commitSync()
。顾名思义,commitSync()是一个阻塞调用,它在成功提交偏移量后才返回,而commitAsync()立即返回。