Java BlockingQueue生成/使用不正确

时间:2014-02-25 11:49:52

标签: java multithreading blockingqueue

我正在开发一个项目,我需要使用TwitterAPI检索Twitter消息,处理它们并将它们存储在数据库中。我正在使用Producer / Consumer BlockingQueue,其中元素的行为如下:

  • 制作人:使用TwitterAPI检索Twitter消息并将其存储在BlockingQueue中。
  • Consumer:从队列中获取一个元素,对其进行处理并将其存储在数据库中。

这是Main类:

    // Creating shared object
    BlockingQueue<TwitterMessage> sharedQueue = new ArrayBlockingQueue<TwitterMessage>(1);

    // Creating Producer and Consumer Thread
    Thread prodThread = new Thread(new TwitterStreamProducer(sharedQueue));
    Thread consThread = new Thread(new TwitterStreamConsumer(sharedQueue));

    // Starting producer and Consumer thread
    prodThread.start();
    consThread.start();

生产者处理TwitterAPI响应并将对象添加到队列中。

@Override
public void run() {
    while (true) {
        try {
            message = extractData(); // extract data from TwitterAPI response and return TwitterMessage object
            sharedQueue.put(message);

            System.out.println("Produced: " + message.getTwitterMessage());
        } catch (Exception ex) {
            Logger.getLogger(TwitterStreamProducer.class.getName()).log(Level.SEVERE, null, ex);
        }
    }
}

消费者如下:

private final BlockingQueue<TwitterMessage> sharedQueue;
private TwitterProcessor twitterProcessor;
private TwitterMessage twitterMessage;

public TwitterStreamConsumer(BlockingQueue<TwitterMessage> sharedQueue) {
    this.sharedQueue = sharedQueue;
    twitterProcessor = new TwitterProcessor();
}

@Override
public void run() {
    while (true) {
        try {
            twitterMessage = this.twitterProcessor.process(sharedQueue.take());
            if (twitterMessage.getTwitterMessage().length() > 1) {
                System.out.printf("Consumed: %s\n", twitterMessage.getTwitterMessage());
            }
        } catch (InterruptedException ex) {
            Logger.getLogger(TwitterStreamConsumer.class.getName()).log(Level.SEVERE, null, ex);
        }
    }
}

正如我所期望的那样:

Produced: …twittermessage1… 
Consumed: …twittermessage1… 
Produced: …twittermessage2… 
Consumed: …twittermessage2… 
Produced: …twittermessage3… 
Consumed: …twittermessage3…
...

但是,我得到的结果如下:

Produced: …twittermessage1…
Produced: …twittermessage2…  <= problem
Consumed: …twittermessage1…
Produced: …twittermessage3…
Consumed: …twittermessage3…
Consumed: …twittermessage3…  <= problem
Produced: …twittermessage4…  <= problem
Produced: …twittermessage5…
Consumed: …twittermessage5…

正如您所看到的,有时Producer和Consumer之间存在重叠,Producer会生成太多未消耗的消息。有时消息被消耗两次(有时甚至超过两次)

EDIT1 这是打印在控制台上的内容:

Produced: @1StevenGeorgiou thanks for the follow #ff
Processed: follow
Produced: @nmagliozzi6 @_PatrickKealy_ but of course!!!!!
Produced: @taylorgaglia Thanks Tayl  miss you tooo
Processed: tayl miss
Produced: Hate this who to follow tab in #twitter it's shows the most pathetic people you know. Accidently added one I had to act fast to unfollow
Processed: hate follow tabshow pathet peopl accid ad act fast unfollow

EDIT2 正如John Vint建议打印出'System.identityHashCode(sharedQueue.take())',我得到以下内容:

Produced: …
Consumed: 1206857787
Produced: …
Consumed: 1206857787
…

有人可以帮我解决这个问题吗?

谢谢!

2 个答案:

答案 0 :(得分:1)

代码的行为应该是这样的:带有线程的执行顺序是未定义的。因此,生成器很可能并且可能在处理前一个消息之前生成多个消息。这甚至是一个理想的功能,因为它允许有几个线程处理提取(生成器),这将需要一些时间阻塞,并且有更少甚至单个消费者实际处理这些中间结果。

但是在您的代码中,您违反了生产者/消费者的这一基本规则,即他们之间的关系需要有所不同。由于您目前为每条消息都有一个生产者/消费者对,因此使用的模式只会减慢速度。您应该增加获取程序的数量(并接受异步处理),或者 - 如果您不想进行异步处理 - 完全删除模式并让“使用者”自己获取消息。

编辑:如果您使用像LinkedBlockingQueue这样的并发队列,则应解决您的问题 另请参阅ExecutorService类,它简化了Runnables的线程化。

答案 1 :(得分:0)

我检查了BlockingQueue并使用Producer / Consumer进行说明,证明它工作正常:

public static void main(String[] args) {
    BlockingQueue<String> queue = new ArrayBlockingQueue<>(16);
    new Thread(new Producer(queue)).start();
    new Thread(new Consumer(queue)).start();
}

private static class Producer implements Runnable {

    private static final String[] MSGS = {
        "msg1", "msg2", "msg3", "msg4", "msg5",
        "msg6", "msg7", "msg8", "msg9", "msg10"
    };

    final BlockingQueue<String> sharedQueue;

    public Producer(BlockingQueue<String> queue) {
        sharedQueue = queue;
    }

    @Override
    public void run() {
        for (String msg : MSGS) {
            try {
                sharedQueue.put(msg);
                // yield the producer thread, so that the consumer could win the CPU
                System.out.println("Produced: " + msg);
                TimeUnit.SECONDS.sleep(1);
            } catch (InterruptedException e) {
                System.out.println("Producer was interrupted: " + msg);
            }
        }
    }

}

private static class Consumer implements Runnable {

    final BlockingQueue<String> sharedQueue;

    public Consumer(BlockingQueue<String> queue) {
        sharedQueue = queue;
    }

    @Override
    public void run() {
        try {
            while (true) {
                String toProcess = sharedQueue.take();
                System.out.println("Consumed: " + toProcess);
            }
        } catch (InterruptedException e) {
            System.out.println("Consumer was interrupted!");
        }
    }

}

所以,我认为这个问题可能是由你生成的消息id(我的意思是你打印出来的twittermessage1)引入的。来自twitterProcessor。