Question

我试图让这个脚本每30分钟运行一次。目前它只运行一次 - 有点困惑为什么它没有运行多次。

关于我的代码出错的问题？基本上这个脚本从XML中获取数据，然后将其放入csv文件中。

尝试使用线程使其每隔很多秒运行一次 - 此时在pythonanywhere上运行它 - 它只会运行一次。有点困惑，为什么会这样！

我也尝试过使用while循环 - 举一个我在线程代码附近尝试过的例子。

from lxml import etree
import urllib.request
import csv
import threading

#Pickle is not needed
#append to list next

def handleLeg(leg):
   # print this leg as text, or save it to file maybe...
   text = etree.tostring(leg, pretty_print=True)
   # also process individual elements of interest here if we want
   tagsOfInterest=["noTrafficTravelTimeInSeconds", "lengthInMeters", "departureTime", "trafficDelayInSeconds"]  # whatever
   #list to use for data analysis
   global data
   data = []
   #create header dictionary that includes the data to be appended within it. IE, Header = {TrafficDelay[data(0)]...etc
   for child in leg:
       if 'summary' in child.tag:
          for elem in child:
              for item in tagsOfInterest:
                  if item in elem.tag:
                      data.append(elem.text)


def parseXML(xmlFile):
"""While option
   lastTime = time.time() - 600
   while time.time() >= lastTime + 600:
    lastTime += 600"""
   #Parse the xml
   threading.Timer(5.0, parseXML).start()
   with urllib.request.urlopen("https://api.tomtom.com/routing/1/calculateRoute/-37.79205923474775,145.03010268799338:-37.798883995180496,145.03040309540322:-37.807106781970354,145.02895470253526:-37.80320743019992,145.01021142594075:-37.7999012967757,144.99318476311566:?routeType=shortest&key=xxx&computeTravelTimeFor=all") as fobj:
       xml = fobj.read()

   root = etree.fromstring(xml)

   for child in root:
       if 'route' in child.tag:
           handleLeg(child)
           # Write CSV file
           with open('datafile.csv', 'w') as fp:
            writer = csv.writer(fp, delimiter=' ')
            # writer.writerow(["your", "header", "foo"])  # write header
            writer.writerows(data)
           """for elem in child:
               if 'leg' in elem.tag:
                   handleLeg(elem)
"""


if __name__ == "__main__":
   parseXML("xmlFile")

with open('datafile.csv', 'r') as fp:
    reader = csv.reader(fp, quotechar='"')
    # next(reader, None)  # skip the headers
    data_read = [row for row in reader]

print(data_read)

Answer 1

你怎么知道它只运行一次？您是否调试过它，或者您希望在代码到达此部分时获得正确的结果？

with open('datafile.csv', 'r') as fp:
    ....

总的来说，您期望发生什么，您的计划何时应该进入此部分？我不知道如何解决这个问题而不知道你想要它做什么，但我想我知道你的问题在哪里。

这是你的程序所做的。我将调用主线程M：

M：if __main__()匹配，parseXML称为
M：parseXML启动一个新线程，我们称之为T1，threading.Timer()
M：parseXML完成并with open...到达。 T1：睡觉（5）
M：print（data_read）T1：仍然可能处于睡眠状态（5）
M：退出 - 只需等待其他线程终止T1：parseXML
M： - T1：启动新线程T2
M： - T1：完成parseXML T2：睡眠（5）
M： - T1：线程退出T2：仍然处于睡眠状态（5）
M： - T1： - T2：parseXML
M： - T1： - T2：启动新线程T3
...

你的程序是如何构建的，parseXML（可能 - 无法运行你的代码，但它看起来是正确的）确实在新线程中启动了自身的延迟副本，但是处理结果的主程序有在新的定时线程修改后，已经退出并且不再读取datafile.csv。

您可以通过在线程上设置daemon=True来验证这一点（这意味着一旦主程序退出，线程就会退出）。现在你的程序没有＆＃34;挂起＆＃34;。它在parseXML的第一次迭代后显示结果，并立即终止定时线程：

#Parse the xml
   _t = threading.Timer(5.0, parseXML)
   _t.daemon = True
   _t.start()
   with urllib.request.urlopen(....)

你真的需要线程吗？或者你可以将datafile.csv处理和显示移动到parseXML，在那里放一段True循环并在迭代之间休息5秒？

另一种可能性是将数据读取器部分移动到另一个将睡眠N秒然后运行读取器的线程。但是在这种情况下你需要锁。如果您在不同的线程中处理相同的文件，最终会发生意外情况，并且当读者决定阅读时，您的编写者只会写入您文件的一部分。您的解析器很可能会崩溃到语法错误。为避免这种情况，请创建一个全局锁并使用它来保护文件读写操作：

foo = threading.Lock()
....
....

with foo:
    with open(...) as fp:
        ....

现在你的文件操作保持原子。

对于冗长的解释感到抱歉，希望这会有所帮助。

线程/ while循环与python

1 个答案: