我正在解析星球openstreetmap文件(plant-latest.osm)并将其内容写入mongodb:
from xml.etree.cElementTree import iterparse
context = iter(iterparse(open('planet-latest.osm'), events=('start', 'end')))
event, root = context.next()
nodeRec = None
for event, elem in context:
name = elem.tag
attrs = elem.attrib
if 'start' == event:
if name == 'node':
nodeRec = {'_id':long(attrs['id']), 'geoPoint': [(float(attrs['lon']), float(attrs['lat'])], 'tags':[]}
if name == 'tag' and nodeRec:
k = attrs['k']
v = attrs['v']
if k=='name':
nodeRec['name'] = v
elif k=='comment':
nodeRec['comment'] = v
else:
nodeRec['tags'].append({'name':k, 'value':v});
elif 'end' == event:
if name == 'node':
...#write to mongodb
nodeRec = None
elem.clear()
root.clear()
这是“top”命令的顶部:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14365 root 20 0 384m 88m 15m R 74.8 0.2 398:18.80 python
1975 mongodb 20 0 378g 12g 11g S 32.2 31.7 164:52.52 mongod
为什么python使用这么少的RAM?我可以通过使用所有内存来提高解析的性能吗?
RAM的数量是40 GB,ubuntu,python 2.7,mongodb 2.7.3。