这可能是一个非常简单的调试问题(我没有单独编码)我有一个循环代码解析一个已删除的xml,这个解析发生在一个5分钟的循环但不会从一个循环返回重复接下来,作为将用户ID存储在集合中的结果,如果用户ID已经存在于用户集中,则脚本将跳到xml的下一行。我想将此脚本的结果输出为RSS,我有一个潜在的方法可以这样做,但我首先需要将数据存储为某种变量。
我试图这样做,但每次我这样做时,我似乎遇到了最后一个用户ID存储在集合中的问题。我没有提供破解的代码,而是附上工作代码的示例,其中不包括我的散列尝试将结果打印定义为变量。
import mechanize
import urllib
import json
import re
import random
import datetime
from sched import scheduler
from time import time, sleep
######Code to loop the script and set up scheduling time
s = scheduler(time, sleep)
random.seed()
def run_periodically(start, end, interval, func):
event_time = start
while event_time < end:
s.enterabs(event_time, 0, func, ())
event_time += interval + random.randrange(-5, 45)
s.run()
###### Code to get the data required from the URL desired
def getData():
post_url = "URL OF INTEREST"
browser = mechanize.Browser()
browser.set_handle_robots(False)
browser.addheaders = [('User-agent', 'Firefox')]
######These are the parameters you've got from checking with the aforementioned tools
parameters = {'page' : '1',
'rp' : '250',
'sortname' : 'roi',
'sortorder' : 'desc'
}
#####Encode the parameters
data = urllib.urlencode(parameters)
trans_array = browser.open(post_url,data).read().decode('UTF-8')
xmlload1 = json.loads(trans_array)
pattern1 = re.compile('> (.*)<')
pattern2 = re.compile('/control/profile/view/(.*)\' title=')
pattern3 = re.compile('<span style=\'font-size:12px;\'>(.*)<\/span>')
##### Making the code identify each row, removing the need to numerically quantify the number of rows in the xmlfile,
##### thus making number of rows dynamic (change as the list grows, required for looping function to work un interupted)
for row in xmlload1['rows']:
cell = row["cell"]
##### defining the Keys (key is the area from which data is pulled in the XML) for use in the pattern finding/regex
user_delimiter = cell['username']
selection_delimiter = cell['race_horse']
if strikeratecalc2 < 12 : continue;
##### REMAINDER OF THE REGEX DELMITATIONS
username_delimiter_results = re.findall(pattern1, user_delimiter)[0]
userid_delimiter_results = (re.findall(pattern2, user_delimiter)[0])
user_selection = re.findall(pattern3, selection_delimiter)[0]
##### Code to stop duplicate posts of each user throughout the day
userset = set ([])
if userid_delimiter_results in userset: continue;
##### Printing the results of the code at hand
print "user id = ",userid_delimiter_results
print "username = ",username_delimiter_results
print "user selection = ",user_selection
print ""
##### Code to stop duplicate posts of each user throughout the day part 2 (udating set to add users already printed to the ignore list)
userset.update(userid_delimiter_results)
getData()
run_periodically(time()+5, time()+1000000, 300, getData)
我在尝试生成变量时遇到的问题(我试图将其作为数组生成)是因为某些代码丢失了最后的userset.update(userid_delimiter_results)这导致Feed中的最后一个条目被重复每次运行代码,因为根据'userset',有问题的用户ID没有被记录。任何使我能够将此代码的结果作为变量输出的简单方法将非常感激。亲切的问候AEA
答案 0 :(得分:1)
我通过制作印刷部分来实现这一目标;
arrayna = [arrayna1, arrayna2, arrayna3, arrayna4]
arraym1 = "user id = ",userid_delimiter_results
然后为了克服在循环阵列的每次运行中将面临的面
my_array = [] # Create an empty list
print(my_array)
所以你的代码看起来像是:
这有效:)