cElementTree.iterparse不使用RAM

时间:2016-06-27 18:15:16

标签: python xml mongodb openstreetmap sax

我正在解析星球openstreetmap文件(plant-latest.osm)并将其内容写入mongodb:

from xml.etree.cElementTree import iterparse

context = iter(iterparse(open('planet-latest.osm'), events=('start', 'end')))
event, root = context.next()

nodeRec = None

for event, elem in context:
    name = elem.tag
    attrs = elem.attrib

    if 'start' == event:
        if name == 'node':
            nodeRec = {'_id':long(attrs['id']), 'geoPoint': [(float(attrs['lon']), float(attrs['lat'])], 'tags':[]} 
        if name == 'tag' and nodeRec:
            k = attrs['k']
            v = attrs['v']
            if k=='name':
                nodeRec['name'] = v
            elif k=='comment':
                nodeRec['comment'] = v
            else:
                nodeRec['tags'].append({'name':k, 'value':v});

    elif 'end' == event:
        if name == 'node':
            ...#write to mongodb
            nodeRec = None
    elem.clear()
    root.clear()

这是“top”命令的顶部:

PID   USER      PR   NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
14365 root      20   0   384m  88m  15m R  74.8  0.2 398:18.80 python
1975  mongodb   20   0   378g  12g  11g S  32.2 31.7 164:52.52 mongod

为什么python使用这么少的RAM?我可以通过使用所有内存来提高解析的性能吗?

RAM的数量是40 GB,ubuntu,python 2.7,mongodb 2.7.3。

0 个答案:

没有答案