我正在使用kafka connect distribution。 命令是:bin / connect-distributed etc / schema-registry / connect-avro-distributed.properties
工作人员配置是:
bootstrap.servers=kafka1:9092,kafka2:9092,kafka3:9092 group.id=connect-cluster key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.json.JsonConverter key.converter.schemas.enable=false value.converter.schemas.enable=false
kafka connect重启但没有错误!
创建了主题connect-configs,connect-offsets,connect-status。 已经创建了主题mysiteview。
然后我使用RESTful API创建kafka连接器,如下所示:
curl -X POST -H "Content-Type: application/json" --data '{"name":"hdfs-sink-mysiteview","config":{"connector.class":"io.confluent.connect.hdfs.HdfsSinkConnector","tasks.max":"3","topics":"mysiteview","hdfs.url":"hdfs://master1:8020","topics.dir":"/kafka/topics","logs.dir":"/kafka/logs","format.class":"io.confluent.connect.hdfs.avro.AvroFormat","flush.size":"1000","rotate.interval.ms":"1000","partitioner.class":"io.confluent.connect.hdfs.partitioner.DailyPartitioner","path.format":"YYYY-MM-dd","schema.compatibility":"BACKWARD","locale":"zh_CN","timezone":"Asia/Shanghai"}}' http://kafka1:8083/connectors
当我将数据生成到主题为“mysiteview”的东西时:
{"f1":"192.168.1.1","f2":"aa.example.com"}
java代码如下:
Properties props = new Properties();
props.put("bootstrap.servers","kafka1:9092");
props.put("acks","all");
props.put("retries",3);
props.put("batch.size", 16384);
props.put("linger.ms",30);
props.put("buffer.memory",33554432);
props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<String,String>(props);
Random rnd = new Random();
for(long nEvents = 0; nEvents < events; nEvents++) {
long runtime = new Date().getTime();
String site = "www.example.com";
String ipString = "192.168.2." + rnd.nextInt(255);
String key = "" + rnd.nextInt(255);
User u = new User();
u.setF1(ipString);
u.setF2(site+" "+rnd.nextInt(255));
System.out.println(JSON.toJSONString(u));
producer.send(new ProducerRecord<String,String>("mysiteview",JSON.toJSONString(u)));
Thread.sleep(50);
}
producer.flush();
producer.close();
Properties props = new Properties();
props.put("bootstrap.servers","kafka1:9092");
props.put("acks","all");
props.put("retries",3);
props.put("batch.size", 16384);
props.put("linger.ms",30);
props.put("buffer.memory",33554432);
props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer<String,String>(props);
Random rnd = new Random();
for(long nEvents = 0; nEvents < events; nEvents++) {
long runtime = new Date().getTime();
String site = "www.example.com";
String ipString = "192.168.2." + rnd.nextInt(255);
String key = "" + rnd.nextInt(255);
User u = new User();
u.setF1(ipString);
u.setF2(site+" "+rnd.nextInt(255));
System.out.println(JSON.toJSONString(u));
producer.send(new ProducerRecord<String,String>("mysiteview",JSON.toJSONString(u)));
Thread.sleep(50);
}
producer.flush();
producer.close();
发生了奇怪的事情。 我从kafka-logs获取数据但在hdfs中没有数据(没有主题目录)。 我尝试连接器命令:
curl -X GET http://kafka1:8083/connectors/hdfs-sink-mysiteview/status
输出是:
{"name":"hdfs-sink-mysiteview","connector":{"state":"RUNNING","worker_id":"10.255.223.178:8083"},"tasks":[{"state":"RUNNING","id":0,"worker_id":"10.255.223.178:8083"},{"state":"RUNNING","id":1,"worker_id":"10.255.223.178:8083"},{"state":"RUNNING","id":2,"worker_id":"10.255.223.178:8083"}]}
但是当我使用以下命令检查任务状态时:
curl -X GET http://kafka1:8083/connectors/hdfs-sink-mysiteview/hdfs-sink-siteview-1
我得到了结果:“错误404”。三个任务是同样的错误!
出了什么问题?
答案 0 :(得分:0)
在没有看到工作人员的日志的情况下,当您使用上述设置时,我不确定您的HDFS连接器实例确实失败了。但是我可以发现配置的一些问题:
import tkinter as tk
from random import choice
root = tk.Tk()
bits = 8
rows = 8
numbers = [0, 1]
matrix_locations = [(x, y + 1) for x in range(bits) for y in range(rows)]
matrix_texts = []
matrix_labels = []
result_locations = [(x, y + 1) for x in range(bits, bits + 1) for y in range(rows)]
result_texts = []
result_labels = []
def generate_labels():
for index, location in enumerate(matrix_locations):
matrix_texts.append(tk.StringVar())
matrix_texts[index].set(choice(numbers))
matrix_labels.append(tk.Label(root, textvariable = matrix_texts[index]))
matrix_labels[index].grid(row = location[1], column = location[0], padx = 2, pady = 2)
for index, location in enumerate(result_locations):
result_texts.append(tk.StringVar())
result_texts[index].set("")
result_labels.append(tk.Label(root, textvariable = result_texts[index]))
result_labels[index].grid(row = location[1], column = location[0], padx = 2, pady = 2, sticky = "W")
read_matrix()
def update_matrix():
for text in matrix_texts:
text.set(choice(numbers))
read_matrix()
def read_matrix():
string_binaries = ["" for row in range(rows)]
for index, location in enumerate(matrix_locations):
string_binaries[location[1] - 1] += str(matrix_texts[index].get())
denaries = [int(string, base = 2) for string in string_binaries]
for row in range(rows):
result_texts[row].set(" = {}".format(denaries[row]))
binary_label = tk.Label(root, text = "Binaries")
binary_label.grid(row = 0, column = 0, columnspan = bits)
denary_label = tk.Label(root, text = "Denaries")
denary_label.grid(row = 0, column = bits)
generate_labels()
update_matrix_button = tk.Button(root, text = "Update matrix", command = update_matrix)
update_matrix_button.grid(row = rows + 1, column = 0, columnspan = bits + 1, sticky = "EW")
root.mainloop()
。这些属性默认将键和值转换器设置为bin/connect-distributed etc/schema-registry/connect-avro-distributed.properties
,并要求您运行AvroConverter
服务。如果您确实已在schema-registry
中编辑了配置以使用connect-avro-distributed.properties
,则在将Kafka记录转换为Connect的JsonConverter
数据类型之前,您的HDFS连接器可能会失败,就在它尝试之前将数据导出到HDFS。 SinkRecord
。最近添加了将记录导出为文本文件作为JSON的功能,并且将显示在连接器的版本AvroConverter
中(您可以通过签出并从源代码构建连接器来尝试此功能)。 此时,我的第一个建议是尝试使用4.0.0
导入您的数据。定义其架构,确认已使用bin/kafka-avro-console-producer
成功导入数据,然后将HDFS连接器设置为使用bin/kafka-avro-console-consumer
,如上所述。连接器页面上的quickstart描述了一个非常类似的过程,也许它将是您用例的一个很好的起点。
答案 1 :(得分:0)
也许你只是在使用REST-Api错误。
根据文件,呼叫应该是
/connectors/:connector_name/tasks/:task_id