在python 3中解析大的xml文件

时间:2017-08-22 06:44:07

标签: python python-3.x xml-parsing lxml

我是python的新手,我正在寻找一个使用以下模板解析大xml文件(~0.5-1 G)的快速实现:

<timestep time="2.00">
    <vehicle id="carflow.0" x="-9897.274589" y="-8.250000" speed="49.840822" lane="section1_0" />
    .... (more vehicles)
</timestep>
... (more timesteps)

我愿意将它解析为DataFrame。 我的代码是(ET是lxml.etree):

def parseXML(filename):
   df = pd.DataFrame()
   old_time = 0.0
   time = 0.0
   events = ("end","start")
   tree = ET.iterparse(filename, events=events)
   for event, elem in tree:
      if elem.tag == "timestep" and event =="start":
          time = float(elem.attrib.get('time'))
      elif elem.tag == "timestep" and event =="end":
          elem.clear()
      elif elem.tag == 'vehicle' and event=="end":
          id = int(elem.attrib.get('id').split('.')[1])
          x = float(elem.attrib.get('x'))
          y = float(elem.attrib.get('y'))
          speed = float(elem.attrib.get('speed'))
          lane = int(elem.attrib.get('lane').split('_')[1])
          data = pd.DataFrame([time, id, x, y, speed, lane]).T
          elem.clear()
          df = df.append(data)
      if time%50 == 0 and time!=old_time:
          old_time = time
          print(time)
   df.columns = ['time','id','x','y','speed','lane']      
   return df

有没有办法改进我的代码?

0 个答案:

没有答案