我尝试只获取带有#not井号的Tweets,但仅当井号位于Tweet的末尾而不在文本中时才获取。我正在使用tweepy.Cursor
此代码已经有效。它为我提供了#not的推文,但不在乎#not的位置。
import tweepy
consumer_key = 'consumer key'
consumer_secret = 'consumer secret'
access_token = 'access token'
access_token_secret = 'access token secret'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
for tweet in tweepy.Cursor(api.search,q="#not",count=5,
lang="en",
since="2017-04-03").items():
print (tweet.created_at, tweet.text)
答案 0 :(得分:1)
编辑:您可以使用正则表达式来检查您的主题标签是否在主题标签的尾随集之间:
import tweepy
import re
consumer_key = 'consumer key'
consumer_secret = 'consumer secret'
access_token = 'access token'
access_token_secret = 'access token secret'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
# Regular expression to check if tweet ends with our hashtag and maybe more hashtags
rgx = re.compile(r"#not(\s+#\w+)*$", re.IGNORECASE)
for tweet in tweepy.Cursor(api.search,q="#not",count=5,
lang="en",
since="2017-04-03").items():
# Keep only tweets with the hashtag at the end
if rgx.search(tweet.text):
print (tweet.created_at, tweet.text)
您可以过滤推文以仅保留符合您要求的推文:
import tweepy
consumer_key = 'consumer key'
consumer_secret = 'consumer secret'
access_token = 'access token'
access_token_secret = 'access token secret'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
for tweet in tweepy.Cursor(api.search,q="#not",count=5,
lang="en",
since="2017-04-03").items():
# Keep only tweets with the hashtag at the end
if tweet.text.lower().endswith('#not'):
print (tweet.created_at, tweet.text)