我正在尝试在python中使用tweepy仅保存乌尔都语推文。我正在使用3.6版本的python。如果打开文件,用乌尔都语语言的数据不会保存在文件中,我只能看到用户名,而看不见乌尔都语中的推文。这是我的代码
相同的代码也适用于英语。
import re
import io
import csv
import tweepy
from tweepy import OAuthHandler
#from textblob import TextBlob
consumer_key = "xxxxxxxxxxxxxxxxxxxxx"
consumer_secret = "xxxxxxxxxxxxxxxxxx"
access_key = "xxxxxxxxxxxxxxxxxxxxx"
access_secret = "xxxxxxxxxxxxxxxxxxxx"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# set access token and secret
auth.set_access_token(access_key, access_secret)
# create tweepy API object to fetch tweets
api = tweepy.API(auth)
def get_tweets(query, count = 300):
# empty list to store parsed tweets
tweets = []
target = io.open("newsfileurdu.csv", 'w', encoding='utf-8')
# call twitter api to fetch tweets
q=str(query)
a=str(q+" اردو")
b=str(q+" خبریں")
c=str(q+" خبریں اردو")
fetched_tweets = api.search(a, count = count)+ api.search(b, count = count)+ api.search(c, count = count)
# parsing tweets one by one
print(len(fetched_tweets))
for tweet in fetched_tweets:
# empty dictionary to store required params of a tweet
parsed_tweet = {}
# saving text of tweet
parsed_tweet['text'] = tweet.text
if "http" not in tweet.text:
line = re.sub("[^A-Za-z]", " ", tweet.text)
target.write(line+"\n")
return tweets
# creating object of TwitterClient Class
# calling function to get tweets
tweets = get_tweets(query ="", count = 20000)
答案 0 :(得分:0)
可以通过将lang="ur"
传递到api.search
函数或使用下面的代码片段来提取所需推文的语言(在这种情况下为urdu)。 >
api.search(q=<query goes here>, lang ['ur'], tweet_mode='extended', count=<tweets_per_query>)
tweepy搜索API使用 ISO 639-1代码指定了“ lang”参数(因此,对于urdu,它是“ ur”)。因此,对于任何所需的语言,只需在language code here中进行搜索,然后将该代码传递给lang参数,即可提取这些推文。