我正在开发一个项目,我需要使用TwitterAPI检索Twitter消息,处理它们并将它们存储在数据库中。我正在使用Producer / Consumer BlockingQueue,其中元素的行为如下:
这是Main类:
// Creating shared object
BlockingQueue<TwitterMessage> sharedQueue = new ArrayBlockingQueue<TwitterMessage>(1);
// Creating Producer and Consumer Thread
Thread prodThread = new Thread(new TwitterStreamProducer(sharedQueue));
Thread consThread = new Thread(new TwitterStreamConsumer(sharedQueue));
// Starting producer and Consumer thread
prodThread.start();
consThread.start();
生产者处理TwitterAPI响应并将对象添加到队列中。
@Override
public void run() {
while (true) {
try {
message = extractData(); // extract data from TwitterAPI response and return TwitterMessage object
sharedQueue.put(message);
System.out.println("Produced: " + message.getTwitterMessage());
} catch (Exception ex) {
Logger.getLogger(TwitterStreamProducer.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
消费者如下:
private final BlockingQueue<TwitterMessage> sharedQueue;
private TwitterProcessor twitterProcessor;
private TwitterMessage twitterMessage;
public TwitterStreamConsumer(BlockingQueue<TwitterMessage> sharedQueue) {
this.sharedQueue = sharedQueue;
twitterProcessor = new TwitterProcessor();
}
@Override
public void run() {
while (true) {
try {
twitterMessage = this.twitterProcessor.process(sharedQueue.take());
if (twitterMessage.getTwitterMessage().length() > 1) {
System.out.printf("Consumed: %s\n", twitterMessage.getTwitterMessage());
}
} catch (InterruptedException ex) {
Logger.getLogger(TwitterStreamConsumer.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
正如我所期望的那样:
Produced: …twittermessage1…
Consumed: …twittermessage1…
Produced: …twittermessage2…
Consumed: …twittermessage2…
Produced: …twittermessage3…
Consumed: …twittermessage3…
...
但是,我得到的结果如下:
Produced: …twittermessage1…
Produced: …twittermessage2… <= problem
Consumed: …twittermessage1…
Produced: …twittermessage3…
Consumed: …twittermessage3…
Consumed: …twittermessage3… <= problem
Produced: …twittermessage4… <= problem
Produced: …twittermessage5…
Consumed: …twittermessage5…
正如您所看到的,有时Producer和Consumer之间存在重叠,Producer会生成太多未消耗的消息。有时消息被消耗两次(有时甚至超过两次)
EDIT1 这是打印在控制台上的内容:
Produced: @1StevenGeorgiou thanks for the follow #ff
Processed: follow
Produced: @nmagliozzi6 @_PatrickKealy_ but of course!!!!!
Produced: @taylorgaglia Thanks Tayl miss you tooo
Processed: tayl miss
Produced: Hate this who to follow tab in #twitter it's shows the most pathetic people you know. Accidently added one I had to act fast to unfollow
Processed: hate follow tabshow pathet peopl accid ad act fast unfollow
EDIT2 正如John Vint建议打印出'System.identityHashCode(sharedQueue.take())',我得到以下内容:
Produced: …
Consumed: 1206857787
Produced: …
Consumed: 1206857787
…
有人可以帮我解决这个问题吗?
谢谢!
答案 0 :(得分:1)
代码的行为应该是这样的:带有线程的执行顺序是未定义的。因此,生成器很可能并且可能在处理前一个消息之前生成多个消息。这甚至是一个理想的功能,因为它允许有几个线程处理提取(生成器),这将需要一些时间阻塞,并且有更少甚至单个消费者实际处理这些中间结果。
但是在您的代码中,您违反了生产者/消费者的这一基本规则,即他们之间的关系需要有所不同。由于您目前为每条消息都有一个生产者/消费者对,因此使用的模式只会减慢速度。您应该增加获取程序的数量(并接受异步处理),或者 - 如果您不想进行异步处理 - 完全删除模式并让“使用者”自己获取消息。
编辑:如果您使用像LinkedBlockingQueue这样的并发队列,则应解决您的问题 另请参阅ExecutorService类,它简化了Runnables的线程化。
答案 1 :(得分:0)
我检查了BlockingQueue并使用Producer / Consumer进行说明,证明它工作正常:
public static void main(String[] args) {
BlockingQueue<String> queue = new ArrayBlockingQueue<>(16);
new Thread(new Producer(queue)).start();
new Thread(new Consumer(queue)).start();
}
private static class Producer implements Runnable {
private static final String[] MSGS = {
"msg1", "msg2", "msg3", "msg4", "msg5",
"msg6", "msg7", "msg8", "msg9", "msg10"
};
final BlockingQueue<String> sharedQueue;
public Producer(BlockingQueue<String> queue) {
sharedQueue = queue;
}
@Override
public void run() {
for (String msg : MSGS) {
try {
sharedQueue.put(msg);
// yield the producer thread, so that the consumer could win the CPU
System.out.println("Produced: " + msg);
TimeUnit.SECONDS.sleep(1);
} catch (InterruptedException e) {
System.out.println("Producer was interrupted: " + msg);
}
}
}
}
private static class Consumer implements Runnable {
final BlockingQueue<String> sharedQueue;
public Consumer(BlockingQueue<String> queue) {
sharedQueue = queue;
}
@Override
public void run() {
try {
while (true) {
String toProcess = sharedQueue.take();
System.out.println("Consumed: " + toProcess);
}
} catch (InterruptedException e) {
System.out.println("Consumer was interrupted!");
}
}
}
所以,我认为这个问题可能是由你生成的消息id(我的意思是你打印出来的twittermessage1)引入的。来自twitterProcessor。