Question

这是一个Twitter抓取代码，用于提取包含着名关键字的推文。

我想每12小时重复下面的整个代码。（或12小时+10分钟休息）。你可以给我重复短语的建议吗？

import tweepy
import time
import os
import json
import simplejson

search_term = 'word1'
search_term2= 'word2'
search_term3='word3'

lat = "xxxx"
lon = "xxxx"
radius = "xxxx"
location = "%s,%s,%s" % (lat, lon, radius)

API_key = "xxxx"
API_secret = "xxxx"
Access_token = "xxxx"
Access_token_secret = "xxxx"

auth = tweepy.OAuthHandler(API_key, API_secret)
auth.set_access_token(Access_token, Access_token_secret)
api = tweepy.API(auth)

c=tweepy.Cursor(api.search,
                 q="{}+OR+{}".format(search_term, search_term2, search_term3),
                rpp=1000,
                geocode=location,
                include_entities=True)

data = {}
i = 1
for tweet in c.items():
    data['text'] = tweet.text
    print(i, ":", data)
    i += 1
    time.sleep(1)

wfile = open(os.getcwd()+"/workk2.txt", mode='w')   
data = {}   
i = 0       

for tweet in c.items():
    data['text'] = tweet.text   
    wfile.write(data['text']+'\n')  
    i += 1

wfile.close()

Answer 1

您可以设置一个每12小时执行一次脚本的Cron作业。为此，您应该使用.py扩展名保存脚本并使其可执行。然后将其添加到您的crontab：

0 0 0/12 * * ? /usr/bin/python yourscript.py

有关详细信息，请查看this问题。或者，python中的包（例如APScheduler）可以帮助您实现这一目标。在APScheduler中，您可以定义这样的工作：

from apscheduler.schedulers.blocking import BlockingScheduler

sched = BlockingScheduler()

@sched.scheduled_job('interval', hours=12)
def timed_job():
    print('This job is run every 12 hours.')

sched.configure(options_from_ini_file)
sched.start()

Twitter Scraping重复执行代码（python）

1 个答案: