使用Python和pymongo进行多线程

时间:2015-05-21 20:14:54

标签: python multithreading mongodb twitter pymongo

嗨我正在制作一个程序,它将对推文进行积极排序,并对有关已经保存在mongodb中的公司进行负面分类,一旦分类,就根据当时的结果更新整数。

我有代码使这成为可能,但我想多线程程序,但我没有在python中没有经验,并一直试图按照教程没有运气,因为程序刚启动和退出没有通过任何代码。

如果有人能帮助我,我将不胜感激。程序代码和预期的多线程如下所示。

from textblob.classifiers import NaiveBayesClassifier
import pymongo
import datetime
from threading import Thread

train = [
('I love this sandwich.', 'pos'),
('This is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('This is my best work.', 'pos'),
("What an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('He is my sworn enemy!', 'neg'),
('My boss is horrible.', 'neg'),
(':)', 'pos'),
(':(', 'neg'),
('gr8', 'pos'),
('gr8t', 'pos'),
('lol', 'pos'),
('bff', 'neg'),
]

test = [
'The beer was good.',
'I do not enjoy my job',
"I ain't feeling dandy today.",
"I feel amazing!",
'Gary is a friend of mine.',
"I can't believe I'm doing this.",
]

filterKeywords = ['IBM', 'Microsoft', 'Facebook', 'Yahoo', 'Apple',   'Google', 'Amazon', 'EBay', 'Diageo',
              'General Motors', 'General Electric', 'Telefonica', 'Rolls Royce', 'Walmart', 'HSBC', 'BP',
              'Investec', 'WWE', 'Time Warner', 'Santander Group']

# Create pos/neg counter variables for each company using dicts
vars = {}
for word in filterKeywords:
vars[word + "SentimentOverall"] = 0


# Initialising the classifier
cl = NaiveBayesClassifier(train)


class TrainingClassification():
    def __init__(self):
        #creating the mongodb connection
        try:
            conn = pymongo.MongoClient('localhost', 27017)
            print "Connected successfully!!!"
            global db
            db = conn.TwitterDB
        except pymongo.errors.ConnectionFailure, e:
            print "Could not connect to MongoDB: %s" % e

        thread1 = Thread(target=self.apple_thread, args=())
        thread1.start()
        thread1.join()
        print "thread finished...exiting"

    def apple_thread(self):
        appleSentimentText = []
        for record in db.Apple.find():
            if record.get('created_at'):
                created_at = record.get('created_at')
                dt = datetime.strptime(created_at, '%a %b %d %H:%M:%S +0000 %Y')
                if record.get('text') and dt > datetime.today():
                    appleSentimentText.append(record.get("text"))
        for targetText in appleSentimentText:
            classificationApple = cl.classify(targetText)
            if classificationApple == "pos":
                vars["AppleSentimentOverall"] = vars["AppleSentimentOverall"] + 1
            elif classificationApple == "neg":
                vars["AppleSentimentOverall"] = vars["AppleSentimentOverall"] - 1

1 个答案:

答案 0 :(得分:3)

您的代码的主要问题在于:

thread1.start()
thread1.join()

当你在一个线程上调用join时,它具有使当前运行的线程(在你的情况下,主线程)等到线程完成(这里,thread1)的效果。所以你可以看到你的代码实际上不会更快。它只是启动一个线程并等待它。由于线程创建,它实际上会略微变慢。

这是进行多线程处理的正确方法:

thread1.start()
thread2.start()
thread1.join()
thread2.join()

在此代码中,线程1和2都将并行运行。

重要提示:请注意,在Python中,它是一种“模拟”并行化。因为Python的核心不是线程安全的(主要是因为它进行垃圾收集的方式),它使用GIL(全局解释器锁),因此进程中的所有线程只运行1个核心。 如果您热衷于使用真正的并行化(例如,如果您的2个线程是CPU边界而不是I / O边界),那么请查看多处理模块。