Question

我们的项目使用运行在centos 6.4上的apns provider来推动oofline msg。

apns提供程序只是使用brpop从redis队列读取，然后重新格式化数据并发送到apns消息到apple推送服务。

最近，我遇到一个问题，即apn提供程序没有从redis队列中读取消息，我只是简单地理解了这个过程：

异常结果：

tcp        0      0 ::1:39688                   ::1:6379                    ESTABLISHED 29452/ruby          
[root@server]# strace -p 29452
Process 29452 attached - interrupt to quit
ppoll([{fd=56, events=POLLIN}], 1, NULL, NULL, 8

正常的结果：

clock_gettime(CLOCK_MONOTONIC, {9266059, 349937955}) = 0
select(9, [8], NULL, NULL, {6, 0})      = 1 (in [8], left {3, 976969})
fcntl64(8, F_GETFL)                     = 0x802 (flags O_RDWR|O_NONBLOCK)
read(8, "*-1\r\n", 1024)                = 5
write(8, "*3\r\n$5\r\nbrpop\r\n$9\r\napn_queue\r\n$1"..., 37) = 37
fcntl64(8, F_GETFL)                     = 0x802 (flags O_RDWR|O_NONBLOCK)
read(8, 0x9a0e5d8, 1024)                = -1 EAGAIN (Resource temporarily unavailable)
clock_gettime(CLOCK_MONOTONIC, {9266061, 374086306}) = 0
select(9, [8], NULL, NULL, {6, 0}^C <unfinished ...>
Process 20493 detached

这是相关代码：

loop do
        begin
          message = @redis.brpop(self.queue, 1)
          if message
              APN.log(:info, "---------->#{message} ----------->\n")
              @notification = APN::Notification.new(JSON.parse(message.last,:symbolize_names => true))

              send_notification
          end
        rescue Exception => e
          if e.class == Interrupt || e.class == SystemExit
            APN.log(:info, 'Shutting down...')
            exit(0)
          end

          APN.log(:error, "class: #{e.class} Encountered error: #{e}, backtrace #{e.backtrace}")

          APN.log(:info, 'Trying to reconnect...')
          client.connect!
          APN.log(:info, 'Reconnected')

          client.push(@notification)
        end
      end

此问题不定期发生，期间可能是一个月或两个月。

我认为代码逻辑是正确的，猜测系统网络可能会影响编程的正常运行。

当我使用pkill [pid]来终止程序时，它只是恢复正常的condiction，从队列中读取msg。

现在我不知道如何分析问题，因此我必须使用cron重新启动或在每个黎明时期向程序发送kill信号。 :(

每个人都有想法来处理这个问题吗？

Answer 1

你在你的异常strace结果ppoll中使用了null超时。正确的方法是

const struct timespec timeout = { .tv_sec = 10, .tv_nsec = 0 };
struct pollfd myfds;
myfds.fd = fd;
myfds.events = POLLIN;
myfds.revents = 0;
retresult = ppoll(&myfds, 1,&timeout,NULL);

一旦10秒完成返回下一个代码，这将给出10秒的延迟。

使用ruby的redis操作在ppoll中阻塞

1 个答案: