我的代码当前从XML文件(从网站获得)中为每个用户打印出数据,随着更多用户全天与其进行交互,XML更新。我目前有我的代码循环每5分钟下载一次这个数据。
每次运行代码时,它都会生成一个用户及其统计信息列表, 前5分钟打印用户:a,b,c
它打印用户的第二个5分钟:a,b,c,d,e
第三次打印用户5分钟:a,b,c,d,e,f,g
我需要代码才能打印前5分钟:a,b,c秒5分钟:d,e第3分钟:f,g
有些人如何认识到某些用户已被使用过。每个用户都有一个唯一的用户ID,我想可以匹配?
我附上了我的代码示例,如果有帮助的话。
import mechanize
import urllib
import json
import re
import random
import datetime
from sched import scheduler
from time import time, sleep
######Code to loop the script and set up scheduling time
s = scheduler(time, sleep)
random.seed()
def run_periodically(start, end, interval, func):
event_time = start
while event_time < end:
s.enterabs(event_time, 0, func, ())
event_time += interval + random.randrange(-5, 45)
s.run()
###### Code to get the data required from the URL desired
def getData():
post_url = "URL OF INTEREST"
browser = mechanize.Browser()
browser.set_handle_robots(False)
browser.addheaders = [('User-agent', 'Firefox')]
######These are the parameters you've got from checking with the aforementioned tools
parameters = {'page' : '1',
'rp' : '250',
'sortname' : 'roi',
'sortorder' : 'desc'
}
#####Encode the parameters
data = urllib.urlencode(parameters)
trans_array = browser.open(post_url,data).read().decode('UTF-8')
xmlload1 = json.loads(trans_array)
pattern1 = re.compile('> (.*)<')
pattern2 = re.compile('/control/profile/view/(.*)\' title=')
pattern3 = re.compile('<span style=\'font-size:12px;\'>(.*)<\/span>')
#########################################################################
##### The request sent from here all the way down including comments#####
#########################################################################
##### Making the code identify each row, removing the need to numerically quantify the number of rows in the xmlfile,
##### thus making number of rows dynamic (change as the list grows, required for looping function to work un interupted)
for row in xmlload1['rows']:
cell = row["cell"]
##### defining the Keys (key is the area from which data is pulled in the XML) for use in the pattern finding/regex
user_delimiter = cell['username']
selection_delimiter = cell['race_horse']
if strikeratecalc2 < 12 : continue;
##### REMAINDER OF THE REGEX DELMITATIONS
username_delimiter_results = re.findall(pattern1, user_delimiter)[0]
userid_delimiter_results = (re.findall(pattern2, user_delimiter)[0])
user_selection = re.findall(pattern3, selection_delimiter)[0]
##### Printing the results of the code at hand
print "user id = ",userid_delimiter_results
print "username = ",username_delimiter_results
print "user selection = ",user_selection
print ""
getData()
run_periodically(time()+5, time()+1000000, 3000, getData)
请好好评论,我现在累计编码了11天,所以也可以原谅我正在使用的代码中的任何重大错误,尽管它到目前为止已经有效了。
亲切的问候
AEA
答案 0 :(得分:5)
我想你可以简单地将唯一ID存储在某处(比如文件或数据库 - Redis
是我最喜欢的),然后检查它们。
要与Redis
一起存储,您可以执行以下操作:
# redis
import redis
pwd = 'l33t'
r = redis.StrictRedis(host='localhost', port=6379, db=1, password=pwd)
# set id's
r.sadd('user_ids', unique_id) # this is a set, with no duplicates
# check for existing id's
r.sismember('user_ids', unique_id) # returns 1 or 0
请参阅http://redis.io/commands#set和https://github.com/andymccurdy/redis-py。您需要Redis
和redis-py
,需要两分钟才能安装。