Tweepy的Streaming API无法识别来自不同设备的推文

时间:2018-05-23 02:28:42

标签: python twitter streaming device tweepy

这是一个非常奇怪和具体的问题。我正在开发一个用户可以发推文的推文机器人,而机器人将反过来接受这些推文,并开发一个鼓舞人心的图片,以配合他们的报价。例如,让我们说我发推文:@fake_quotes_bot"我要挨饿,直到他们听到" - 甘地那么它就会把这个引号和连字符旁边的那个人一起生成一个图像。

从一般性开始,我刚刚编写了一个引用过滤器,以确保机器人能够以最有效的方式获取引号。所以,例如,这不会有用:@fake_quotes_bot"你好'这是" ' " a"报价" - 人在这个引用过滤器中,如果用户错误引用了他们的推文(如图所示),我的机器人将自动回复有关如何正确构建其推文的说明。在我的桌面上运行PyCharm中的机器人,然后使用不同的帐户在机器人上发推文时,一切都很棒。满足错误消息,如果正确构造了推文,它将批准推文。然而,当我从运行机器人的台式计算机以外的其他设备发送推文时,问题就出现了。当通过iPhone发送推文时,从桌面发推文时表现得非常完美的逻辑现在已经变得平淡无奇。无论我丢弃什么推文,机器人都会收到相同的错误消息。

这是我的代码:

import tweepy
import json

consumer_key, consumer_secret = ###, ###
access_token, access_token_secret = ###, ###

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)


def Data_Analysis(tweet, tweet_data):
    def Data_Write():
        print("DATA FOR THIS TWEET:", tweet_data, "\n")

    def Quote_Filter():

        print("INCOMING TWEET: " + "  > " + str(tweet) + " <   " + " FROM: @" +
              str(tweet_data.get('user', '').get('screen_name', '')) + "/" + tweet_data.get('user', '').get('name', ''))

        def Profanity_Filter():
            pass

        def Person_Filter():
            #WIP for now
            print("Filtering people...", end=" ")
            print("SUCCESSFUL")
            print("APPROVED TWEET: " + tweet)
            print("APPROVED TWEET DATA:", tweet_data, "\n")

        def Quotation_Marks_Filter():

            print("Filtering quotation marks...", end=" ")

            # Filters out tweets that contain quotes
            if '"' in tweet or "'" in tweet:
                double_quote_count = tweet.count('"')
                single_quote_count = tweet.count("'")

                # Double Quotes quote
                if double_quote_count > 0 and single_quote_count == 0:
                    if double_quote_count > 2:
                        api.update_status("@" + str(tweet_data.get('user', '').get('screen_name', '')) +
                                          " ERROR: Please refrain from using too many quotation marks.",
                                          tweet_data.get('id'))

                        print("ERROR: Please refrain from using too many quotation marks. \n")
                    elif double_quote_count == 1:
                        api.update_status("@" + str(tweet_data.get('user', '').get('screen_name', '')) +
                                          " ERROR: Only a singular quote was entered.",
                                          tweet_data.get('id'))

                        print("ERROR: Only a singular quote was entered. \n")
                    # Pass through to other filter
                    else:
                        print("SUCCESSFUL")
                        Person_Filter()

                # Single quotes quote
                elif double_quote_count == 0 and single_quote_count > 0:
                    if single_quote_count > 2:
                        api.update_status("@" + str(tweet_data.get('user', '').get('screen_name', '')) +
                                          " ERROR: Please refrain from using too many quotation marks.",
                                          tweet_data.get('id'))

                        print("ERROR: Please refrain from using too many quotation marks. \n")
                    elif single_quote_count == 1:
                        api.update_status("@" + str(tweet_data.get('user', '').get('screen_name', '')) +
                                          " ERROR: Only a singular quote was entered.",
                                          tweet_data.get('id'))

                        print("ERROR: Only a singular quote was entered. \n")
                    # Pass through to other filter
                    else:
                        print("SUCCESSFUL")
                        Person_Filter()

                # If a quote has two types of quotes
                else:
                    # Filter if there are too many quotes per character
                    if double_quote_count > 2:
                        api.update_status("@" + str(tweet_data.get('user', '').get('screen_name', '')) +
                                          " ERROR: If you are implementing a quote within a quote or are abbreviating,"
                                          "please refrain from using more than two instances of a double quote."
                                          , tweet_data.get('id'))

                        print("ERROR: If you are implementing a quote within a quote or are abbreviating,"
                              "please refrain from using more than two instances of a double quote. \n")
                    elif double_quote_count == 1:
                        api.update_status("@" + str(tweet_data.get('user', '').get('screen_name', '')) +
                                          " ERROR: Could not identify the quote. If you are implementing a quote "
                                          "within a quote or are abbreviating,  please use two instances of the "
                                          "double quote.",
                                          tweet_data.get('id'))

                        print("ERROR: Could not identify the quote. If you are implementing a quote "
                              "within a quote or are abbreviating,  please use two instances of the "
                              "double quote. \n")

                    # If it's correct in its number, then figure out its beginning and ending quotes to pull text
                    else:
                        quote_indexes = []
                        quote_chars = []

                        indices = [index for index, value in enumerate(tweet) if value == '"']
                        for i in indices:
                            quote_indexes.append(i)
                            quote_chars.append('"')

                        indices = [index for index, value in enumerate(tweet) if value == "'"]
                        for i in indices:
                            quote_indexes.append(i)
                            quote_chars.append("'")

                        beginning_quote = quote_indexes.index(min(quote_indexes))
                        ending_quote = quote_indexes.index(max(quote_indexes))

                        # If the starting and ending quotes are similar (I.E. " and ") then pass through to other filter
                        if quote_chars[beginning_quote] == quote_chars[ending_quote]:
                            print("SUCCESSFUL")
                            Person_Filter()

                        # Do not align
                        else:
                            api.update_status("@" + str(tweet_data.get('user', '').get('screen_name', '')) +
                                              " ERROR: The beginning and endings quotes do not align.",
                                              tweet_data.get('id'))

                            print("ERROR: The beginning and endings quotes do not align. \n")

            # No quote found
            elif '"' or "'" not in tweet:
                grab_user = tweet_data.get('user', '').get('screen_name', '')

                if grab_user == "fake_quotes_bot":
                    # If I were to test this on my own twitter handle, it would get stuck in an auto-reply loop.
                    # Which will probably ban me.
                    print("PASSING UNDER MY OWN SCREEN NAME... \n")

                if grab_user != "fake_quotes_bot":
                    api.update_status("@" + str(tweet_data.get('user', '').get('screen_name', '')) +
                                      " ERROR: This tweet does not contain a quote. Be sure to use quotation marks.",
                                      tweet_data.get('id'))

                    print("ERROR: This tweet does not contain a quote. Be sure to use quotation marks. \n")

        def Retweet_Filter():

            print("Filtering retweets...", end=" ")

            # Filters out tweets that are retweets
            if "RT" in tweet[0:3]:
                print("RETWEET. SKIPPING... \n")
            else:
                print("SUCCESSFUL")
                Quotation_Marks_Filter()

        Retweet_Filter()

    Quote_Filter()


class StreamListener(tweepy.StreamListener):

    def on_data(self, data):
        tweet_data = json.loads(data)
        if "extended_tweet" in tweet_data:
            tweet = tweet_data['extended_tweet']['full_text']
            Data_Analysis(tweet, tweet_data)

        else:
            try:
                tweet = tweet_data['text']
                Data_Analysis(tweet, tweet_data)
            except KeyError:
                print("ERROR: Failed to retrieve tweet. \n")


print("BOT IS NOW RUNNING. SEARCHING FOR TWEETS...\n")

Listener = StreamListener()
Stream = tweepy.Stream(auth=api.auth, listener=Listener, tweet_mode='extended')
Stream.filter(track=['@fake_quotes_bot'])

来自同一桌面的推文输出:

INCOMING TWEET:   > @fake_quotes_bot "hello, stackoverflow!" <    FROM: @bulletinaction/BulletInAction
Filtering retweets... SUCCESSFUL
Filtering quotation marks... SUCCESSFUL
Filtering people... SUCCESSFUL
APPROVED TWEET: @fake_quotes_bot "hello, stackoverflow!"
APPROVED TWEET DATA: {###data###} 

如果我通过手机发送推文,则输出:

INCOMING TWEET:   > @fake_quotes_bot “heyyo, stackoverflow” <    FROM: @bulletinaction/BulletInAction
Filtering retweets... SUCCESSFUL
Filtering quotation marks... ERROR: This tweet does not contain a quote. Be sure to use quotation marks. 

这是运行代码的youtube视频,因为我确定这是一个非常奇怪的问题:https://www.youtube.com/watch?v=skErnva4ePc&feature=youtu.be

1 个答案:

答案 0 :(得分:1)

您的手机使用左右双引号而不是引号:

"   U+0022 QUOTATION MARK
“   U+201C LEFT DOUBLE QUOTATION MARK
”   U+201D RIGHT DOUBLE QUOTATION MARK

(来自:Are there different types of double quotes in utf-8 (PHP, str_replace)?

所以只需在测试前对推文文本进行替换:

tweet = tweet_data['extended_tweet']['full_text'] # as you did, then :
tweet = tweet.replaceAll("[\\u2018\\u2019]", "'")
tweet = tweet.replaceAll("[\\u201C\\u201D]", "\"");

(来自:Converting MS word quotes and apostrophes