在这个问题上我需要你的帮助。我读到spout负责读取数据或准备在Bolt中处理。所以我在spout中写了一些代码来打开文件并逐行读取
class SimSpout(storm.Spout):
# Not much to do here for such a basic spout
def initialize(self, conf, context):
## Open the file with read only permit
self.f = open('data.txt', 'r')
## Read the first line
self._conf = conf
self._context = context
storm.logInfo("Spout instance starting...")
# Process the next tuple
def nextTuple(self):
# check if it reach at the EOF to close it
for line in self.f.readlines():
# Emit a random sentence
storm.logInfo("Emiting %s" % line)
storm.emit([line])
# Start the spout when it's invoked
SimSpout().run()
是吗?
答案 0 :(得分:0)
你正在编写Spout,他在Storm中的职责是发出下游螺栓的元组来处理。
Spout的nextTuple责任是每次调用它时发出一个事件。在您的代码中,您将发出文件中的所有行。如果你的单个元组是单行。你应该在文件中保留一个偏移量并读取它 offset line和emit,update offset = offset + 1。 类似下面的内容
class SimSpout(storm.Spout):
# Not much to do here for such a basic spout
def initialize(self, conf, context):
## Open the file with read only permit
self.f = open('data.txt', 'r')
## Read the first line
self._conf = conf
self._context = context
self._offset = 0
storm.logInfo("Spout instance starting...")
# Process the next tuple
def nextTuple(self):
# check if it reach at the EOF to close it
with open(self.f) as f:
f.readlines()[self._offset]
#Emit a random sentence
storm.logInfo("Emiting %s" % line)
storm.emit([line])
self._offset = self._offset + 1