Question

我正在尝试在python中使用tweepy仅保存乌尔都语推文。我正在使用3.6版本的python。如果打开文件，用乌尔都语语言的数据不会保存在文件中，我只能看到用户名，而看不见乌尔都语中的推文。这是我的代码

相同的代码也适用于英语。

import re
import io
import csv
import tweepy
from tweepy import OAuthHandler
#from textblob import TextBlob


consumer_key = "xxxxxxxxxxxxxxxxxxxxx"
consumer_secret = "xxxxxxxxxxxxxxxxxx"
access_key = "xxxxxxxxxxxxxxxxxxxxx"
access_secret = "xxxxxxxxxxxxxxxxxxxx"


auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# set access token and secret
auth.set_access_token(access_key, access_secret)
# create tweepy API object to fetch tweets
api = tweepy.API(auth)




def get_tweets(query, count = 300):

    # empty list to store parsed tweets
    tweets = []
    target = io.open("newsfileurdu.csv", 'w', encoding='utf-8')
    # call twitter api to fetch tweets
    q=str(query)
    a=str(q+" اردو")
    b=str(q+" خبریں")
    c=str(q+" خبریں اردو")
    fetched_tweets = api.search(a, count = count)+ api.search(b, count = count)+ api.search(c, count = count)
    # parsing tweets one by one
    print(len(fetched_tweets))

    for tweet in fetched_tweets:

        # empty dictionary to store required params of a tweet
        parsed_tweet = {}
        # saving text of tweet
        parsed_tweet['text'] = tweet.text
        if "http" not in tweet.text:
            line = re.sub("[^A-Za-z]", " ", tweet.text)
            target.write(line+"\n")
    return tweets

    # creating object of TwitterClient Class
    # calling function to get tweets
tweets = get_tweets(query ="", count = 20000)

Answer 1

可以通过将lang="ur"传递到api.search函数或使用下面的代码片段来提取所需推文的语言（在这种情况下为urdu）。 >

api.search(q=<query goes here>, lang ['ur'], tweet_mode='extended', count=<tweets_per_query>)

tweepy搜索API使用 ISO 639-1代码指定了“ lang”参数（因此，对于urdu，它是“ ur”）。因此，对于任何所需的语言，只需在language code here中进行搜索，然后将该代码传递给lang参数，即可提取这些推文。

将tweepy中的乌尔都语推文保存为CSV文件

1 个答案: