我正在尝试创建一个分类为健康和不健康的食物推文数据集。我写了两个脚本,这些脚本正在播放我指定的关键字的推文,然后我对其应用情绪分析,这对我来说很容易将它们归类为健康和不健康但文本blob的情感分析并不那么令人满意,并且该脚本还提取了不包含这些关键字的推文。 如果有人知道食物推文数据集,那么这将是有用的。
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import time
import os
from textblob import TextBlob
import json
ck`enter code here`ey = "xxx"
csecret = "xx"
atok`enter code here`en = "xx"
asecret = "xx"
class listener(StreamListener):
def on_data(self, data):
try:
tweet = data.split(',"text":"')[1].split('","source')[0]
print tweet
saveThis = str(time.time()) + '::' + tweet
#tweet = data.split(',"text":"')[1]
analysis=TextBlob(tweet)
polarity=analysis.sentiment.polarity
print(polarity)
if polarity <0 :
#username = data["user"]["screen_name"]
saveThis = tweet + '::' + str(polarity)
out = open('out1.csv', 'a')
out.write(saveThis)
out.write('\n')
out.close()
#return (True)
#saveThis = str(time.time()) + '::' + tweet + '::' + str(polarity)
#saveFile = open('unhealthytweet1.json', 'a')
#saveFile.write(saveThis)
#saveFile.write('\n')
#saveFile.close()
return (True)
elif polarity>0 :
#username = data["user"]["screen_name"]
#username, " :: ",
saveThis =tweet + '::' + str(polarity)
out = open('out2.csv', 'a')
out.write(saveThis)
out.write('\n')
out.close()
# return (True)
# saveThis = str(time.time()) + '::' + tweet + '::' + str(polarity)
# saveFile = open('unhealthytweet1.json', 'a')
# saveFile.write(saveThis)
# saveFile.write('\n')
# saveFile.close()
return (True)
except BaseException, e:
print 'failed on_date,', str(e)
time.sleep(5)
pass
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["vegetable soup", "fruits", "green tea", "vegetables", "fresh juice", "salad","sea food"], languages=['en'])
#
答案 0 :(得分:0)
我是编程新手,但我遇到了类似的问题。我找到的,这可能是不正确的,但它适用于我的程序,是通过监听器传递该变量的关键字应用原始输入,然后运行if in data ['text']&lt;¬¬我也引用了这个附近top作为带编码的变量('utf-8')。
如果关键字不在其中,则只需传递。