如何将关键字映射到python中的某些类别

时间:2017-11-02 23:15:27

标签: python twitter nlp nltk

我目前正在开发一个项目,根据他们所属的某些类别的信息对推文进行分类。例如,一条带有关键词“我认为纽约应禁烟”的推文在“污染”类别中被归类为推文,并带有负面情绪。

我能够让情绪分析有所帮助,但需要一些帮助来创建一个类别数据库并将其链接到python。我也对其他解决方案持开放态度。

到目前为止我的代码如下:1)stream.py。我使用以下命令将实时twitter数据转换为文本文件:python stream.py> output.txt的

import oauth2 as oauth
import urllib2 as urllib

api_key = 'xx'
api_secret = 'xx'

access_token_key = 'x-x'
access_token_secret = 'x'

_debug = 0

oauth_token    = oauth.Token(key=access_token_key, secret=access_token_secret)
oauth_consumer = oauth.Consumer(key=api_key, secret=api_secret)

signature_method_hmac_sha1 = oauth.SignatureMethod_HMAC_SHA1()

http_method = "GET"


http_handler  = urllib.HTTPHandler(debuglevel=_debug)
https_handler = urllib.HTTPSHandler(debuglevel=_debug)

'''
Construct, sign, and open a twitter request
using the hard-coded credentials above.
'''
def twitterreq(url, method, parameters):
  req = oauth.Request.from_consumer_and_token(oauth_consumer,
                                             token=oauth_token,
                                             http_method=http_method,
                                             http_url=url, 
                                             parameters=parameters)

  req.sign_request(signature_method_hmac_sha1, oauth_consumer, oauth_token)

  headers = req.to_header()

  if http_method == "POST":
    encoded_post_data = req.to_postdata()
  else:
    encoded_post_data = None
    url = req.to_url()

  opener = urllib.OpenerDirector()
  opener.add_handler(http_handler)
  opener.add_handler(https_handler)

  response = opener.open(url, encoded_post_data)

  return response
#locations=-74,40,-73,41
def fetchsamples():
  url = "https://stream.twitter.com/1.1/statuses/filter.json?track=money&locations=-74,40,-73,41"
  parameters = []
  response = twitterreq(url, "POST", parameters)
  for line in response:
    print(line.strip())

if __name__ == '__main__':
  fetchsamples()

推文的情绪计算为推文中每个词语的情绪分数之和。 运行:python tweet_sentiment.py AFINN-111.txt tweet_file获取推文情绪。

以下是我为AFINN-111.txt上传的链接。http://s000.tinyupload.com/index.php?file_id=62473255612293859764

以下是tweet_sentiment.py

的代码
import sys
import json
import ast
import re

def calcScoreFromTerm(termScoreFile):   # returns a dictionary with term-score values
    scores ={}
    for line in termScoreFile:
        term, score = line.split("\t")
        scores[term] = float(score)
    return scores

def getTweetText(tweet_file):   #returns a list of all tweets
    tweets = []
    for line in tweet_file:
        # print line
        jsondata = json.loads(line)
        if "text" in jsondata.keys():
            tweets.append(jsondata["text"])
    tweet_file.close()
    return tweets

def filterTweet(et):
    # Remove punctuations and non-alphanumeric chars from each tweet string
    pattern = re.compile('[^A-Za-z0-9]+')
    et = pattern.sub(' ', et)
    #print encoded_tweet

    words = et.split()

    # Filter unnecessary words
    for w in words:
        if w.startswith("RT") or w.startswith("www") or w.startswith("http"):
            words.remove(w)

    return words

0 个答案:

没有答案