我正在围绕从Twitter REST API获取的推文创建一个Web应用程序。我遇到一个问题,在python控制台中打印时,正确显示韩语,中文和其他亚洲语言等非unicode字符,但是当我将它存储到SQL数据库时,字符串值变为" ?? ??? ???"或类似的。
我正在使用Twython模块,这是我获取推文的代码,它正常工作,当我打印推文时,它正确显示这些字符
from twython import Twython
import json
APP_KEY= 'abcdfefdags'
APP_SECRET = 'abcdefghdfa'
SEARCH_QUERY='russia'
SEARCH_COUNT= 3
twitter = Twython(APP_KEY, APP_SECRET, oauth_version=2)
ACCESS_TOKEN = twitter.obtain_access_token()
twitter1 = Twython(APP_KEY, access_token=ACCESS_TOKEN)
def getTweetQuery():
return SEARCH_QUERY
def getTweetTextDict():
tweetTempList = []
data = []
listOfTweets = dict()
data = twitter1.search(q=SEARCH_QUERY, count=SEARCH_COUNT)
for x in range(0,SEARCH_COUNT):
tweetData = dict()
s = (data['statuses'][x]['text'])
tweetData['text'] = s
s = (data['statuses'][x]['created_at'])
tweetData['created_at'] = s
s = (data['statuses'][x]['user']['name'])
tweetData['name'] = s
s = (data['statuses'][x]['user']['profile_image_url'])
tweetData['profile_image_url'] = s
listOfTweets[x] = tweetData
return listOfTweets
以下是将数据存储到SQL数据库时的代码
import mysql.connector
from firstsite.website import twit
class SQLDataSystem:
def insertNewTweets(self):
cnx = mysql.connector.connect(user='djangouser', password='1234',
host='127.0.0.1',
database='django_db')
cursor = cnx.cursor()
dataPacket = twit.getTweetTextDict()
dataPacketLength = len(dataPacket.keys())
for x in range (0, dataPacketLength):
tweet = dataPacket[x]['text']
tweetTime = dataPacket[x]['created_at']
twitterName = dataPacket[x]['name']
twitterPicture = dataPacket[x]['profile_image_url']
add_tweet = ("INSERT INTO website_tweets " +
"(tweet, tweetTime, twitterName, twitterPicture) "+
"VALUES (%s, %s, %s, %s)")
arguments = (tweet, tweetTime, twitterName, twitterPicture)
cursor.execute(add_tweet, arguments)
cnx.commit()
当我通过
检查数据库时SELECT * FROM website_tweets;
以及检索它并通过Python打印它,这可能是一个字符串 ' @nytvideo @@ KOREA:왜이테러리스트들은구속하지않나요??' 变成 ' @nytvideo @@ KOREA:? ? ??????? ???? ?????' 我该如何解决这个问题?
答案 0 :(得分:0)
问题不在您的脚本中,而在于数据库设置。 看看 http://dev.mysql.com/doc/refman/5.1/en/faqs-cjk.html#qandaitem-A-11-1-2
答案 1 :(得分:0)
需要检查的3件事情是:
use_unicode=True