Question

我使用tweepy流API来获取包含特定主题标签的推文。我面临的问题是我无法从Streaming API中提取推文的全文。只有140个字符可用，之后会被截断。

以下是代码：

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)

def analyze_status(text):

if 'RT' in text[0:3]:
    return True
else:
    return False

class MyStreamListener(tweepy.StreamListener):

def on_status(self, status):

if not analyze_status(status.text) :

    with open('fetched_tweets.txt','a') as tf:
        tf.write(status.text.encode('utf-8') + '\n\n')

    print(status.text)

def on_error(self, status):
print("Error Code : " + status)

def test_rate_limit(api, wait=True, buffer=.1):
    """
    Tests whether the rate limit of the last request has been reached.
    :param api: The `tweepy` api instance.
    :param wait: A flag indicating whether to wait for the rate limit reset
             if the rate limit has been reached.
    :param buffer: A buffer time in seconds that is added on to the waiting
               time as an extra safety margin.
    :return: True if it is ok to proceed with the next request. False otherwise.
    """
    #Get the number of remaining requests
    remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
    #Check if we have reached the limit
    if remaining == 0:
    limit = int(api.last_response.getheader('x-rate-limit-limit'))
    reset = int(api.last_response.getheader('x-rate-limit-reset'))
    #Parse the UTC time
    reset = datetime.fromtimestamp(reset)
    #Let the user know we have reached the rate limit
    print "0 of {} requests remaining until {}.".format(limit, reset)

    if wait:
        #Determine the delay and sleep
        delay = (reset - datetime.now()).total_seconds() + buffer
        print "Sleeping for {}s...".format(delay)
        sleep(delay)
        #We have waited for the rate limit reset. OK to proceed.
        return True
    else:
        #We have reached the rate limit. The user needs to handle the rate limit manually.
        return False 

    #We have not reached the rate limit
    return True

myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener , 
tweet_mode='extended')


myStream.filter(track=['#bitcoin'],async=True)

有人有解决方案吗？

Answer 1

tweet_mode=extended对此代码无效，因为Streaming API不支持该参数。如果Tweet包含较长的文本，它将在JSON响应中包含一个名为extended_tweet的附加对象，该对象将包含一个名为full_text的字段。

在这种情况下，您需要print(status.extended_tweet.full_text)之类的内容来提取较长的文字。

Answer 2

除了prevoius答案：在我的情况下，它仅作为<div [ngClass]="{'text-success':r.favourite ,'text-danger':!r.favourite}">工作，因为status.extended_tweet['full_text']只是一本字典。

Answer 3

您必须启用扩展推文模式，如下所示：

char *

然后您可以打印扩展的tweet，但是请记住，由于Twitter API，您必须确保存在扩展的tweet，否则会引发错误

char *

为我工作。

Answer 4

以@AndyPiper的answer为基础，可以通过try / except来检查该推文是否存在：

  def get_tweet_text(tweet):
    try:
      return tweet.extended_tweet['full_text']
    except AttributeError as e:
      return tweet.text

或检查内部json：

  def get_tweet_text(tweet):
    if 'extended_tweet' in tweet._json:
      return tweet.extended_tweet['full_text']
    else:
      return tweet.text

请注意，extended_tweet是一个字典对象，因此“ tweet.extended_tweet.full_text”实际上不起作用，并且会引发错误。

Answer 5

这对我有用：

status = tweet if 'extended_tweet' in status._json: status_json = status._json['extended_tweet']['full_text'] elif 'retweeted_status' in status._json and 'extended_tweet' in status._json['retweeted_status']: status_json = status._json['retweeted_status']['extended_tweet']['full_text'] elif 'retweeted_status' in status._json: status_json = status._json['retweeted_status']['full_text'] else: status_json = status._json['full_text'] print(status_json)'

https://github.com/tweepy/tweepy/issues/935-从此处实施，需要更改他们的建议，但想法保持不变

Answer 6

Twitter流中有布尔值可用。当消息包含140个以上的字符时，“ status.truncated”为True。只有这样“ extended_tweet”对象才可用：

        if not status.truncated:
            text = status.text
        else:
            text = status.extended_tweet['full_text']

此功能仅在流式推文时有效。当您使用API方法收集较旧的推文时，可以使用以下内容：

tweets = api.user_timeline(screen_name='whoever', count=5, tweet_mode='extended')
for tweet in tweets:
    print(tweet.full_text)

此全文字段包含所有推文的文本，无论是否被截断。

Answer 7

我使用以下函数：

def full_text_tweeet(id_):
    status = api.get_status(id_, tweet_mode="extended")
    try:
        return status.retweeted_status.full_text
    except AttributeError:  
        return status.full_text

然后在我的列表中调用它

 tweets_list = []
    # foreach through all tweets pulled
    for tweet in tweets:
        # printing the text stored inside the tweet object
        tweet_list = [str(tweet.id),str(full_text_tweeet(tweet.id))]
        tweets_list.append(tweet_list)

Answer 8

试试这个，这是最简单最快的方法。

def on_status(self, status):
if hasattr(status, "retweeted_status"):  # Check if Retweet
    try:
        print(status.retweeted_status.extended_tweet["full_text"])
    except AttributeError:
        print(status.retweeted_status.text)
else:
    try:
        print(status.extended_tweet["full_text"])
    except AttributeError:
        print(status.text)

Visit the link it will give you the how extended tweet can be achieve

tweepy Streaming API：全文

8 个答案: