如何存储坐标流,从tweepy api到mysql db?

时间:2017-01-26 21:31:17

标签: python mysql tweepy

我正在使用此脚本使用tweepy从Twitter发送推文,我遇到了坐标参数的问题。

每当我收到带坐标的推文时,我都会收到此错误:

(1064, ‘You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \’: “\’Point\'”, u\’coordinates\’: \'(28.5383355,-81.3792365)\’})\’ at line 1′)

另外,我的Coordinates条件只存储带有地理位置的推文没有生效。所有传入的推文似乎都存储在数据库中。

import tweepy
import json
import MySQLdb
from dateutil import parser

WORDS = ['#bigdata', '#AI', '#datascience', '#machinelearning', '#ml', '#iot']

CONSUMER_KEY = ""
CONSUMER_SECRET = ""
ACCESS_TOKEN = ""
ACCESS_TOKEN_SECRET = ""

HOST = ""
USER = ""
PASSWD = ""
DATABASE = ""

# This function takes the 'created_at', 'text', 'screen_name', 'tweet_id' and 'coordinates' and stores it
# into a MySQL database
def store_data(created_at, text, screen_name, tweet_id, coordinates):
    db=MySQLdb.connect(host=HOST, user=USER, passwd=PASSWD, db=DATABASE, charset="utf8")
    cursor = db.cursor()
    insert_query = "INSERT INTO twitter (tweet_id, screen_name, created_at, text, coordinates) VALUES (%s, %s, %s, %s, %s)"
    cursor.execute(insert_query, (tweet_id, screen_name, created_at, text, coordinates))
    db.commit()
    cursor.close()
    db.close()
    return

class StreamListener(tweepy.StreamListener):
    #This is a class provided by tweepy to access the Twitter Streaming API.

    def on_connect(self):
        # Called initially to connect to the Streaming API
        print("You are now connected to the streaming API.")

    def on_error(self, status_code):
        # On error - if an error occurs, display the error / status code
        print('An Error has occured: ' + repr(status_code))
        return False

    def on_data(self, data):
        #This is the meat of the script...it connects to your mongoDB and stores the tweet
        try:
           # Decode the JSON from Twitter
            datajson = json.loads(data)

            if datajson['coordinates']=='None':
                print 'coordinates = None, skipped'

            else:

            #grab the wanted data from the Tweet
                text = datajson['text']
                screen_name = datajson['user']['screen_name']
                tweet_id = datajson['id']
                created_at = parser.parse(datajson['created_at'])
                coordinates = datajson['coordinates']

                #print out a message to the screen that we have collected a tweet
                print("Tweet collected at " + str(created_at))
                #print datajson
                #insert the data into the MySQL database
                store_data(created_at, text, screen_name, tweet_id, coordinates)

        except Exception as e:
           print(e)
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
#Set up the listener. The 'wait_on_rate_limit=True' is needed to help with Twitter API rate limiting.
listener = StreamListener(api=tweepy.API(wait_on_rate_limit=True))
streamer = tweepy.Stream(auth=auth, listener=listener)
print("Tracking: " + str(WORDS))
streamer.filter(track=WORDS)

1 个答案:

答案 0 :(得分:0)

SQL错误

来自tweepy的结果,坐标不是数字或字符串,而是一个对象。

您需要将此对象解析为lan& lon并将每个列保存在不同的列中。

获取纬度和经度而不是此行:

coordinates = datajson['coordinates']

这样做:

latitude, longitude = datajson["coordinates"]["coordinates"]

另外,我的Coordinates条件只存储带有地理位置的推文没有生效。所有传入的推文似乎都存储在数据库中。

'无'是一个字符串,而不是变量None

取代:

if datajson['coordinates']=='None':

使用:

if datajson['coordinates'] is None:

或更好:

if not datajson['coordinates']: