我们正在开展一个项目,我们在HDInsight上使用Storm来分析实时数据。我们使用事件中心作为输入和输出,我们在通过拓扑传递数据时遇到一些问题。我们目前有一个JavaSpout作为输入处理程序,一个定制Bolt(Bolt1)假设对数据进行一些分析,一个JavaBolt假设采取分析的数据并将其发送到输出事件中心。通过JavaSpout和JavaBolts传递数据就像一个魅力,但是当我们生成自定义Bolt时,数据被封装或者其他东西,它没有显示它应该是什么。输出应该显示一个JSON字符串,但显示一些随机的东西,如:[B @ 6d645e45
大部分内容来自本教程的代码:http://azure.microsoft.com/sv-se/documentation/articles/hdinsight-storm-develop-csharp-event-hub-topology/
这是我们的拓扑构建器:
TopologyBuilder topologyBuilder = new TopologyBuilder("EventHubReaderTest");
int partitionCount = Properties.Settings.Default.EventHubPartitionCount;
JavaComponentConstructor constructor = JavaComponentConstructor.CreateFromClojureExpr(
String.Format(@"(com.microsoft.eventhubs.spout.EventHubSpout. (com.microsoft.eventhubs.spout.EventHubSpoutConfig. " +
@"""{0}"" ""{1}"" ""{2}"" ""{3}"" {4} ""{5}""))",
Properties.Settings.Default.EventHubPolicyName,
Properties.Settings.Default.EventHubPolicyKey,
Properties.Settings.Default.EventHubNamespace,
Properties.Settings.Default.EventHubNameInput,
partitionCount,
""));
topologyBuilder.SetJavaSpout(
"EventHubSpout",
constructor,
partitionCount);
List<string> javaSerializerInfo = new List<string>() { "microsoft.scp.storm.multilang.CustomizedInteropJSONSerializer" };
topologyBuilder.SetBolt(
"bolten",
Bolt1.Get,
new Dictionary<string, List<string>>()
{
{Constants.DEFAULT_STREAM_ID, new List<string>(){"Event"}}
},
partitionCount).
DeclareCustomizedJavaSerializer(javaSerializerInfo).
shuffleGrouping("EventHubSpout");
JavaComponentConstructor constructorout =
JavaComponentConstructor.CreateFromClojureExpr(
String.Format(@"(com.microsoft.eventhubs.bolt.EventHubBolt. (com.microsoft.eventhubs.bolt.EventHubBoltConfig. " +
@"""{0}"" ""{1}"" ""{2}"" ""{3}"" ""{4}"" {5}))",
Properties.Settings.Default.EventHubPolicyName,
Properties.Settings.Default.EventHubPolicyKey,
Properties.Settings.Default.EventHubNamespace,
"servicebus.windows.net", //suffix for servicebus fqdn
Properties.Settings.Default.EventHubNameOutput,
"true"));
topologyBuilder.SetJavaBolt(
"EventHubBolt",
constructorout,
partitionCount).
shuffleGrouping("bolten");
return topologyBuilder;
这是Bolt,假设做一些工作
public Bolt1(Context ctx)
{
this.ctx = ctx;
Dictionary<string, List<Type>> inputSchema = new Dictionary<string, List<Type>>();
inputSchema.Add("default", new List<Type>() { typeof(string) });
Dictionary<string, List<Type>> outputSchema = new Dictionary<string, List<Type>>();
outputSchema.Add("default", new List<Type>() { typeof(string) });
this.ctx.DeclareComponentSchema(new ComponentStreamSchema(inputSchema, outputSchema));
this.ctx.DeclareCustomizedDeserializer(new CustomizedInteropJSONDeserializer());
}
public static Bolt1 Get(Context ctx, Dictionary<string, Object> parms)
{
return new Bolt1(ctx);
}
//this is there the magic should happen
public void Execute(SCPTuple tuple)
{
string test = "something";
//we are currently just trying to emit a string
ctx.Emit(new Values(test));
}
我们希望我们能够很好地解释这个问题,我们不太了解拓扑结构是如何工作的,因此难以排除故障。
修改 我们通过在拓扑中声明解串器来解决它: 列出javaSerializerInfo = new List(){&#34; microsoft.scp.storm.multilang.CustomizedInteropJSONSerializer&#34; }; 列出javaDeserializerInfo = new List(){&#34; microsoft.scp.storm.multilang.CustomizedInteropJSONDeserializer&#34;,&#34; java.lang.String&#34; };
topologyBuilder.SetBolt(
"bolten",
Bolt1.Get,
new Dictionary<string, List<string>>()
{
{Constants.DEFAULT_STREAM_ID, new List<string>(){"Event"}}
},
partitionCount).
DeclareCustomizedJavaSerializer(javaSerializerInfo).
DeclareCustomizedJavaDeserializer(javaDeserializerInfo).
shuffleGrouping("EventHubSpout");
在自定义C#bolt中,我们声明了一个序列化器:
this.ctx.DeclareComponentSchema(new ComponentStreamSchema(inputSchema, outputSchema));
this.ctx.DeclareCustomizedDeserializer(new CustomizedInteropJSONDeserializer());
this.ctx.DeclareCustomizedSerializer(new CustomizedInteropJSONSerializer());