喷口逻辑错误

时间:2017-04-04 08:51:52

标签: python apache-storm

在这个问题上我需要你的帮助。我读到spout负责读取数据或准备在Bolt中处理。所以我在spout中写了一些代码来打开文件并逐行读取

class SimSpout(storm.Spout):
    # Not much to do here for such a basic spout
    def initialize(self, conf, context):
    ## Open the file with read only permit
        self.f = open('data.txt', 'r')
    ## Read the first line
        self._conf = conf
        self._context = context
        storm.logInfo("Spout instance starting...")
    # Process the next tuple
    def nextTuple(self):
        # check if it reach at the EOF to close it
      for line in self.f.readlines():
        # Emit a random sentence
        storm.logInfo("Emiting %s" % line)
        storm.emit([line])

# Start the spout when it's invoked
SimSpout().run()

是吗?

1 个答案:

答案 0 :(得分:0)

你正在编写Spout,他在Storm中的职责是发出下游螺栓的元组来处理。

Spout的nextTuple责任是每次调用它时发出一个事件。在您的代码中,您将发出文件中的所有行。如果你的单个元组是单行。你应该在文件中保留一个偏移量并读取它 offset line和emit,update offset = offset + 1。 类似下面的内容

class SimSpout(storm.Spout):

  # Not much to do here for such a basic spout
  def initialize(self, conf, context):
    ## Open the file with read only permit
    self.f = open('data.txt', 'r')
    ## Read the first line
    self._conf = conf
    self._context = context
    self._offset = 0
    storm.logInfo("Spout instance starting...")

 # Process the next tuple
 def nextTuple(self):
    # check if it reach at the EOF to close it
    with open(self.f) as f:
      f.readlines()[self._offset]
      #Emit a random sentence
      storm.logInfo("Emiting %s" % line)
      storm.emit([line])
    self._offset = self._offset + 1