我的用例是我想将Avro数据从Kafka推送到HDFS。加缪似乎是正确的工具,但我无法使它工作。 我是camus的新手,试图让camus-example工作, https://github.com/linkedin/camus
现在我正在尝试使用camus-example工作。但是我仍然面临着问题。
DummyLogKafkaProducerClient的代码片段
package com.linkedin.camus.example.schemaregistry;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.Random;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import com.linkedin.camus.etl.kafka.coders.KafkaAvroMessageEncoder;
import com.linkedin.camus.example.records.DummyLog;
public class DummyLogKafkaProducerClient {
public static void main(String[] args) {
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:6667");
// props.put("serializer.class", "kafka.serializer.StringEncoder");
// props.put("partitioner.class", "example.producer.SimplePartitioner");
//props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
Producer<String, byte[]> producer = new Producer<String, byte[]>(config);
KafkaAvroMessageEncoder encoder = get_DUMMY_LOG_Encoder();
for (int i = 0; i < 500; i++) {
KeyedMessage<String, byte[]> data = new KeyedMessage<String, byte[]>("DUMMY_LOG", encoder.toBytes(getDummyLog()));
producer.send(data);
}
}
public static DummyLog getDummyLog() {
Random random = new Random();
DummyLog dummyLog = DummyLog.newBuilder().build();
dummyLog.setId(random.nextLong());
dummyLog.setLogTime(new Date().getTime());
Map<CharSequence, CharSequence> machoStuff = new HashMap<CharSequence, CharSequence>();
machoStuff.put("macho1", "abcd");
machoStuff.put("macho2", "xyz");
dummyLog.setMuchoStuff(machoStuff);
return dummyLog;
}
public static KafkaAvroMessageEncoder get_DUMMY_LOG_Encoder() {
KafkaAvroMessageEncoder encoder = new KafkaAvroMessageEncoder("DUMMY_LOG", null);
Properties props = new Properties();
props.put(KafkaAvroMessageEncoder.KAFKA_MESSAGE_CODER_SCHEMA_REGISTRY_CLASS, "com.linkedin.camus.example.schemaregistry.DummySchemaRegistry");
encoder.init(props, "DUMMY_LOG");
return encoder;
}
}
我还添加了默认的无参数构造函数ot DummySchemaRegistry,因为它提供了实例化异常
package com.linkedin.camus.example.schemaregistry;
import org.apache.avro.Schema;
import org.apache.hadoop.conf.Configuration;
import com.linkedin.camus.example.records.DummyLog;
import com.linkedin.camus.example.records.DummyLog2;
import com.linkedin.camus.schemaregistry.MemorySchemaRegistry;
/**
* This is a little dummy registry that just uses a memory-backed schema registry to store two dummy Avro schemas. You
* can use this with camus.properties
*/
public class DummySchemaRegistry extends MemorySchemaRegistry<Schema> {
public DummySchemaRegistry(Configuration conf) {
super();
super.register("DUMMY_LOG", DummyLog.newBuilder().build().getSchema());
super.register("DUMMY_LOG_2", DummyLog2.newBuilder().build()
.getSchema());
}
public DummySchemaRegistry() {
super();
super.register("DUMMY_LOG", DummyLog.newBuilder().build().getSchema());
super.register("DUMMY_LOG_2", DummyLog2.newBuilder().build().getSchema());
}
}
我在运行程序
后得到的异常跟踪线程“main”中的异常 com.linkedin.camus.coders.MessageEncoderException: org.apache.avro.AvroRuntimeException: org.apache.avro.AvroRuntimeException:字段ID类型:LONG pos:0未设置 并且没有默认值 com.linkedin.camus.etl.kafka.coders.KafkaAvroMessageEncoder.init(KafkaAvroMessageEncoder.java:55) 在 com.linkedin.camus.example.schemaregistry.DummyLogKafkaProducerClient.get_DUMMY_LOG_Encoder(DummyLogKafkaProducerClient.java:57) 在 com.linkedin.camus.example.schemaregistry.DummyLogKafkaProducerClient.main(DummyLogKafkaProducerClient.java:32) 引起:org.apache.avro.AvroRuntimeException: org.apache.avro.AvroRuntimeException:字段ID类型:LONG pos:0未设置 并且没有默认值 com.linkedin.camus.example.records.DummyLog $ Builder.build(DummyLog.java:214) 在 com.linkedin.camus.example.schemaregistry.DummySchemaRegistry。(DummySchemaRegistry.java:16) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 方法)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 在 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) 在java.lang.Class.newInstance(Class.java:438)at com.linkedin.camus.etl.kafka.coders.KafkaAvroMessageEncoder.init(KafkaAvroMessageEncoder.java:52) ... 2更多引起:org.apache.avro.AvroRuntimeException:字段ID 类型:LONG pos:0未设置且没有默认值 org.apache.avro.data.RecordBuilderBase.defaultValue(RecordBuilderBase.java:151) 在 com.linkedin.camus.example.records.DummyLog $ Builder.build(DummyLog.java:209) ... 9更多
答案 0 :(得分:1)
我认为camus希望Avro架构具有默认值。我已将dummyLog.avsc更改为以下并重新编译 -
{ “namespace”:“com.linkedin.camus.example.records”, “type”:“记录”, “名字”:“DummyLog”, “doc”:“记录不那么重要的东西。”, “田地”:[ { “名字”:“id”, “type”:“int”, “默认”:0 }, { “name”:“logTime”, “type”:“int”, “默认”:0 } ] }
让我知道它是否适合你。
谢谢, Ambarish
答案 1 :(得分:0)
您可以按如下方式默认任何字符串或长字段
{"type":"record","name":"CounterData","namespace":"org.avro.usage.tutorial","fields":[{"name":"word","type":["string","null"]},{"name":"count","type":["long","null"]}]}
答案 2 :(得分:0)
Camus并不认为架构将具有默认值。我最近用camus发现了同样的问题。实际上,在默认示例中,它在架构注册表中使用的方式不正确。我已经对Camus代码进行了一些修改,你可以查看https://github.com/chandanbansal/camus有一些细微的改动,以使其工作。 他们没有Avro记录的解码器。我也写过。
答案 3 :(得分:0)
我遇到了这个问题,因为我正在初始化注册表:
super.register("DUMMY_LOG_2", LogEvent.newBuilder().build().getSchema());
当我将其更改为:
super.register("logEventAvro", LogEvent.SCHEMA$);
这让我通过了例外。
我还使用了Garry's com.linkedin.camus.etl.kafka.coders.AvroMessageDecoder
。
我还发现这个博客(Alvin Jin's Notebook)非常有用。它通过camus示例确定了您可能遇到的每个问题,并解决了它!