我正在开发Kafka Streams。我面临以下问题:
到目前为止我所做的详情:
我创建了以下主题,流和表:
./kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic bptcus
./kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic address-elasticsearch-sink
为上述创建的主题创建表格和流。
CREATE table CUSTOMER_SRC (customerId VARCHAR,name VARCHAR, age VARCHAR, address VARCHAR) WITH (KAFKA_TOPIC='bptcus', VALUE_FORMAT='JSON', KEY='customerId');
CREATE stream ADDRESS_SRC (addressId VARCHAR, city VARCHAR, state VARCHAR) WITH (KAFKA_TOPIC='address-elasticsearch-sink', VALUE_FORMAT='JSON');
我能够看到如下数据:
select * from customer_src;
1528743137610 | Parent-1528743137047 | Ron | 31 | [{"addressId":"1","city":"Fremont","state":"CA"},{"addressId":"2","city":"Dallas","state":"TX"}]
select * from address_src;
1528743413826 | Parent-1528743137047 | 1 | Detroit | MI
通过加入上面创建的表和流来创建另一个流。
CREATE stream CUST_ADDR_SRC as select c.name , c.age , c.address, a.rowkey, a.addressId , a.city , a.state from ADDRESS_SRC a left join CUSTOMER_SRC c on c.rowkey=a.rowkey;
我能够看到CUST_ADDR_SRC流中的数据如下:
select * from cust_addr_src;
1528743413826 | Parent-1528743137047 | Ron | 31 | [{"addressId":"1","city":"Fremont","state":"CA"},{"addressId":"2","city":"Dallas","state":"TX"}] | Parent-1528743137047 | 1 | Detroit | MI
我的问题:
Print Kafka Stream Input out to console?
这是我的代码:
Properties config = new Properties();
config.put(StreamsConfig.APPLICATION_ID_CONFIG, "cusadd-application");
config.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "10.1.61.125:9092");
config.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, "10.1.61.125:2181");
config.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
config.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
KStreamBuilder builder = new KStreamBuilder();
KStream<String, String> source = builder.stream("cust_addr_src");
source.foreach(new ForeachAction<String, String>() {
public void apply(String key, String value) {
System.out.println("Stream key values are: " + key + ": " + value);
}
});
我没有看到输出。
只有,我可以看到以下输出:
12:04:42.145 [StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - 将分区cust_addr_src-0的偏移重置为最新偏移量。 12:04:42.145 [StreamThread-1] DEBUG org.apache.kafka.clients.NetworkClient - 在hsharma-mbp15.local:9092处启动与节点0的连接。 12:04:42.145 [StreamThread-1] DEBUG org.apache.kafka.common.metrics.Metrics - 添加了名为node-0.bytes-sent的传感器 12:04:42.145 [StreamThread-1] DEBUG org.apache.kafka.common.metrics.Metrics - 添加名为node-0.bytes-received的传感器 12:04:42.145 [StreamThread-1] DEBUG org.apache.kafka.common.metrics.Metrics - 添加了名为node-0.latency的传感器 12:04:42.145 [StreamThread-1] DEBUG org.apache.kafka.clients.NetworkClient - 与节点0的已完成连接 12:04:42.145 [StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.Fetcher - 为分区cust_addr_src-0提取偏移量0 12:04:42.676 [StreamThread-1] DEBUG org.apache.kafka.common.metrics.Metrics - 添加了名为topic.cust_addr_src.bytes-fetched的传感器 12:04:42.680 [StreamThread-1] DEBUG org.apache.kafka.common.metrics.Metrics - 添加名为topic.cust_addr_src.records-fetched的传感器 12:04:45.150 [StreamThread-1] DEBUG org.apache.kafka.clients.consumer.internals.AbstractCoordinator - 收到群组cusadd-application的心跳响应成功。
提前致谢。
答案 0 :(得分:0)
我看到两种方法:
ARRAY<STRUCT<addressId STRING, city STRING, state>>
类型,而不是字符串。然后,您可以使用数组的元素和结构的字段来构建输出,例如ARRAY[
STRUCT(
addressId := address[0]->addressId,
city := address_src->city,
state := address[0]->state
),
... same for second element
]
上面将创建一个包含两个结构的数组,并设置了新的城市。
当然,这仅在数组中始终有两个元素的情况下有效。如果数量可变,则需要使用长窗口CASE语句根据数组的大小执行不同的操作。例如
CASE
WHEN ARRAY_LENGTH(address) = 1
THEN ARRAY[STRUCT(addressId := address[0]->addressId, city := address_src->city, state := address[0]->state)]
WHEN ARRAY_LENGTH(address) = 2
THEN ARRAY(... with two elements...)
WHEN ARRAY_LENGTH(address) = 3
THEN ARRAY(... with three elements...)
END
等等