我正在尝试使用印地语情感词(例如खुशी,गुस्सा)抓取Twitter的印地语推文,以使用python 2.7获取这些词的推文。我正在使用Streaming API,其代码位于
之下import codecs
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
access_token = "xxxxxxxxxxxxxxxx"
access_token_secret = "xxxxxxxxxxxxxxx"
consumer_key = "xxxxxxxxxxxxxxxx"
consumer_secret = "xxxxxxxxxxxxxxxxx"
class StdOutListener(StreamListener):
def on_data(self, data):
print data
saveFile = codecs.open('TweetPrjkhushh.txt', 'a', 'utf-8')
saveFile.write(data)
saveFile.write('\n')
saveFile.close()
return True
def on_error(self, status):
print status
if __name__ == '__main__':
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
t = u"खुशी"
stream.filter(languages=["hi"],track=[t])
我用Unicode得到推文文本:
{"text":"RT @guru9899: \u092f\u0947 \u092c\u0947\u091c\u093e\u0928 \u0928\u0947 \u092c\u094b\u0932\u093e \u092f\u093e @abpnewshindi \u0915\u0940 \u092e\u0941\u0939\u0940\u092e \u0939\u0948 ??? \u0939\u093e\u0925 \u0935\u093e\u092a\u0938 \u092d\u0940 \u0924\u094b \u0916\u0940\u0902\u091a \u0938\u0915\u0924\u0947 \u0925\u0947 ??? \u091c\u092c\u0930\u0926\u0938\u094d\u0924\u0940 \u0925\u094b\u0921\u093c\u0940 \u0939\u0948 \ud83d\ude02\ud83d\ude02\ud83d\ude02 https:\/\/t.co\/BE0gSEj\u2026"}
当我打开保存推文的文件时,我想用印地语字体显示它,但在保存时使用编解码器和utf-8编码并没有帮助。我在这里缺少什么?
答案 0 :(得分:0)
data
是一个词典。
更改您的代码以指定data
的键输入:
def on_data(self, data):
print data["text"]
saveFile = codecs.open('TweetPrjkhushh.txt', 'a', 'utf-8')
saveFile.write(data["text"])
saveFile.write('\n')
saveFile.close()
return True