我试图使用图形api获取fb页面数据。每个帖子的大小超过1MB,其中kafka默认fetch.message为1MB。我通过在kafa consumer.properties和server.properties文件中添加以下行,将kafka属性从1MB更改为3MB。
fetch.message.max.bytes=3048576 (consumer.properties)
file message.max.bytes=3048576 (server.properties)
replica.fetch.max.bytes=3048576 (server.properties )
现在在Kafka中添加上述行后,3MB消息数据将进入kafka数据日志。但是STORM无法处理这些数据,它只能读取默认大小,即1MB数据。我应该在风暴拓扑中添加哪些参数来从kafka topic中读取3MB数据。我需要在风暴中增加buffer.size吗?我对此并不清楚。
这是我的拓扑代码。
String argument = args[0];
Config conf = new Config();
conf.put(JDBC_CONF, map);
conf.setDebug(true);
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
//set the number of workers
conf.setNumWorkers(3);
TopologyBuilder builder = new TopologyBuilder();
//Setup Kafka spout
BrokerHosts hosts = new ZkHosts("localhost:2181");
String topic = "year1234";
String zkRoot = "";
String consumerGroupId = "group1";
SpoutConfig spoutConfig = new SpoutConfig(hosts, topic, zkRoot, consumerGroupId);
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme()); KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);
builder.setSpout("KafkaSpout", kafkaSpout,1); builder.setBolt("user_details", new Parserspout(),1).shuffleGrouping("KafkaSpout"); builder.setBolt("bolts_user", new bolts_user(cp),1).shuffleGrouping("user_details");
提前致谢
答案 0 :(得分:0)
类SpoutConfig扩展了KafkaConfig,它具有以下所有设置:
public int fetchSizeBytes = 1024 * 1024;
public int socketTimeoutMs = 10000;
public int fetchMaxWait = 10000;
public int bufferSizeBytes = 1024 * 1024;
public MultiScheme scheme = new RawMultiScheme();
public boolean ignoreZkOffsets = false;
public long startOffsetTime = kafka.api.OffsetRequest.EarliestTime();
public long maxOffsetBehind = Long.MAX_VALUE;
public boolean useStartOffsetTimeIfOffsetOutOfRange = true;
public int metricsTimeBucketSizeInSecs = 60;
请注意它们是公开的,因此您可以更改它们
spoutConfig.fetchSizeBytes = 3048576;
spoutConfig.bufferSizeBytes = 3048576;