Twitter Scraping重复执行代码(python)

时间:2017-04-29 08:20:04

标签: python twitter web-scraping web-crawler scheduler

这是一个Twitter抓取代码,用于提取包含着名关键字的推文。

我想每12小时重复下面的整个代码。 (或12小时+10分钟休息)。你可以给我重复短语的建议吗?

import tweepy
import time
import os
import json
import simplejson

search_term = 'word1'
search_term2= 'word2'
search_term3='word3'

lat = "xxxx"
lon = "xxxx"
radius = "xxxx"
location = "%s,%s,%s" % (lat, lon, radius)

API_key = "xxxx"
API_secret = "xxxx"
Access_token = "xxxx"
Access_token_secret = "xxxx"

auth = tweepy.OAuthHandler(API_key, API_secret)
auth.set_access_token(Access_token, Access_token_secret)
api = tweepy.API(auth)

c=tweepy.Cursor(api.search,
                 q="{}+OR+{}".format(search_term, search_term2, search_term3),
                rpp=1000,
                geocode=location,
                include_entities=True)

data = {}
i = 1
for tweet in c.items():
    data['text'] = tweet.text
    print(i, ":", data)
    i += 1
    time.sleep(1)

wfile = open(os.getcwd()+"/workk2.txt", mode='w')   
data = {}   
i = 0       

for tweet in c.items():
    data['text'] = tweet.text   
    wfile.write(data['text']+'\n')  
    i += 1

wfile.close()

1 个答案:

答案 0 :(得分:1)

您可以设置一个每12小时执行一次脚本的Cron作业。为此,您应该使用.py扩展名保存脚本并使其可执行。然后将其添加到您的crontab

0 0 0/12 * * ? /usr/bin/python yourscript.py

有关详细信息,请查看this问题。或者,python中的包(例如APScheduler)可以帮助您实现这一目标。在APScheduler中,您可以定义这样的工作:

from apscheduler.schedulers.blocking import BlockingScheduler

sched = BlockingScheduler()

@sched.scheduled_job('interval', hours=12)
def timed_job():
    print('This job is run every 12 hours.')

sched.configure(options_from_ini_file)
sched.start()