在Python中将非表格/分块数据转换为嵌套字典

时间:2014-06-18 07:07:15

标签: python data-structures

我有一个看起来像这样的分块数据:

>Head1
foo 0 1.10699e-05 2.73049e-05
bar 0.939121 0.0173732 0.0119144
qux 0 2.34787e-05 0.0136463

>Head2
foo 0 0.00118929 0.00136993
bar 0.0610655 0.980495 0.997179
qux 0.060879 0.982591 0.974276

每个块都是白色空间分隔的。 我想要做的是将它们转换为嵌套字典,如下所示:

{ 'Head1': {'foo': '0 1.10699e-05 2.73049e-05',
            'bar': '0.939121 0.0173732 0.0119144',
            'qux': '0 2.34787e-05 0.0136463'},
  'Head2': {'foo': '0 0.00118929 0.00136993',
             'bar': '0.0610655 0.980495 0.997179',
             'qux': '0.060879 0.982591 0.974276'}
}

在Python中使用它的方法是什么? 我不确定怎么离开这里:

def parse():
    caprout="tmp.txt"
    with open(caprout, 'r') as file:
        datalines = (ln.strip() for ln in file)
        for line in datalines:
            if line.startswith(">Head"):
                print line
            elif not line.strip():
                print line
            else:
                print line
    return

def main()
    parse()
    return 

if __name__ == '__main__'
parse()

2 个答案:

答案 0 :(得分:1)

这是我能想到的最简单的解决方案:

mainDict = dict()
file = open(filename, 'r')
for line in file:
    line = line.strip()
    if line == "" :
        continue
    if line.find("Head") :
        lastBlock = line
        mainDict[lastBlock] = dict()
        continue
    splitLine = line.partition(" ")
    mainDict[lastBlock][splitLine[0]] = splitLine[2]

答案 1 :(得分:1)

文件:

[sgeorge@sgeorge-ld1 tmp]$ cat tmp.txt 
>Head1
foo 0 1.10699e-05 2.73049e-05
bar 0.939121 0.0173732 0.0119144
qux 0 2.34787e-05 0.0136463

>Head2
foo 0 0.00118929 0.00136993
bar 0.0610655 0.980495 0.997179
qux 0.060879 0.982591 0.974276

脚本:

[sgeorge@sgeorge-ld1 tmp]$ cat a.py 
import json
dict_ = {}

def parse():
  caprout="tmp.txt"
  with open(caprout, 'r') as file:
  datalines = (ln.strip() for ln in file)
  for line in datalines:
   if line != '':
     if line.startswith(">Head"):
       key = line.replace('>','')
       dict_[key] = {}
     else:
       nested_key = line.split(' ',1)[0]
       value = line.split(' ',1)[1]
       dict_[key][nested_key] = value
  print json.dumps(dict_)
parse()

执行:

[sgeorge@sgeorge-ld1 tmp]$ python a.py  | python -m json.tool
{
"Head1": {
    "bar": "0.939121 0.0173732 0.0119144", 
    "foo": "0 1.10699e-05 2.73049e-05", 
    "qux": "0 2.34787e-05 0.0136463"
}, 
"Head2": {
    "bar": "0.0610655 0.980495 0.997179", 
    "foo": "0 0.00118929 0.00136993", 
    "qux": "0.060879 0.982591 0.974276"
}
}