我只是想知道Python的行为以及它是如何工作的。我有一个脚本来运行和收集帐户的所有粉丝和朋友。
这是代码。
#!/usr/bin/env python
import pymongo
import tweepy
from pymongo import MongoClient
from sweepy.get_config import get_config
config = get_config()
consumer_key = config.get('PROCESS_TWITTER_CONSUMER_KEY')
consumer_secret = config.get('PROCESS_TWITTER_CONSUMER_SECRET')
access_token = config.get('PROCESS_TWITTER_ACCESS_TOKEN')
access_token_secret = config.get('PROCESS_TWITTER_ACCESS_TOKEN_SECRET')
MONGO_URL = config.get('MONGO_URL')
MONGO_PORT = config.get('MONGO_PORT')
MONGO_USERNAME = config.get('MONGO_USERNAME')
MONGO_PASSWORD = config.get('MONGO_PASSWORD')
client = MongoClient(MONGO_URL, int(MONGO_PORT))
print 'Establishing Tweepy connection'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, retry_count=3)
db = client.tweets
db.authenticate(MONGO_USERNAME, MONGO_PASSWORD)
raw_tweets = db.raw_tweets
users = db.users
def is_user_in_db(screen_name):
return get_user_from_db(screen_name) is None
def get_user_from_db(screen_name):
return users.find_one({'screen_name' : screen_name})
def get_user_from_twitter(user_id):
return api.get_user(user_id)
def get_followers(screen_name):
users = []
for i, page in enumerate(tweepy.Cursor(api.followers, id=screen_name, count=200).pages()):
print 'Getting page {} for followers'.format(i)
users += page
return users
def get_friends(screen_name):
users = []
for i, page in enumerate(tweepy.Cursor(api.friends, id=screen_name, count=200).pages()):
print 'Getting page {} for friends'.format(i)
users += page
return users
def get_followers_ids(screen_name):
ids = []
for i, page in enumerate(tweepy.Cursor(api.followers_ids, id=screen_name, count=5000).pages()):
print 'Getting page {} for followers ids'.format(i)
ids += page
return ids
def get_friends_ids(screen_name):
ids = []
for i, page in enumerate(tweepy.Cursor(api.friends_ids, id=screen_name, count=5000).pages()):
print 'Getting page {} for friends ids'.format(i)
ids += page
return ids
def process_user(user):
screen_name = user['screen_name']
print 'Processing user : {}'.format(screen_name)
if is_user_in_db(screen_name):
user['followers_ids'] = get_followers_ids(screen_name)
user['friends_ids'] = get_friends_ids(screen_name)
users.insert_one(user)
else:
print '{} exists!'.format(screen_name)
print 'End processing user : {}'.format(screen_name)
if __name__ == "__main__":
for doc in raw_tweets.find({'processed' : {'$exists': False}}):
print 'Start processing'
try:
process_user(doc['user'])
except KeyError:
pass
try:
process_user(doc['retweeted_status']['user'])
except KeyError:
pass
raw_tweets.update_one({'_id': doc['_id']}, {'$set':{'processed':True}})
我从日志中得到的是
Rate limit reached. Sleeping for: 889
Establishing Tweepy connection
Start processing
Processing user : littleaddy80
Establishing Tweepy connection
Start processing
Processing user : littleaddy80
Establishing Tweepy connection
Start processing
Processing user : littleaddy80
Establishing Tweepy connection
Start processing
Processing user : littleaddy80
Rate limit reached. Sleeping for: 891
我想知道因为Establishing Tweepy connection
在__main__
之外而且它不应该一遍又一遍地运行。我只是想知道为什么Python表现得那样或我的代码中有错误?
答案 0 :(得分:1)
如果您希望导入时仅运行 的代码,则会进入普通else
后卫的__main__
子句:
if __name__ == '__main__':
print("Run as a script")
else:
print("Imported as a module")
答案 1 :(得分:1)
这就是为什么有
的原因if __name__ == "__main__":
在此条件之前,您应该有函数和类定义,然后是您想要运行的代码。
原因是导入文件时__name__
变量不同(因为每个python文件也是可导入模块)并且例如运行python myfile.py
。
创建文件,例如myfile.py
:
# content of myfile.py
print(__name__)
当您运行它时,它将打印__main__
。
$ python myfile.py
__main__
但在导入期间,它带有导入模块的名称(myfile
)。
$ python
>>> import myfile
myfile
答案 2 :(得分:0)
当您运行/导入python脚本时,其中的每个语句都会被执行(但是在导入时,这只会在第一次导入模块或执行drawable-hdpi/
时)。通常存在一些可以注意到的陈述:
这就是为什么通常不会将代码直接放在python脚本的顶层 - 它将被执行。如果它既可以作为脚本也可以作为模块运行 - 作为脚本运行时应该运行的代码应该包含在reload(module)
语句中。
除非您需要全局变量,否则您的脚本将是一系列函数定义和类定义,后跟:
if __name__ == '__main__'
如果您需要全局变量,有时需要特别小心,以避免在运行/导入模块时产生副作用。