Python:将缺少数据和不规则模式的表转换为字典

时间:2018-04-09 17:52:22

标签: python ascii python-2.6

在Python 2.6中,使用不规则宽度,缺少数据和缩进子条目的列来过滤ASCII表以将它们转换为字典的pythonic方法是什么?

col1      col2  col3       col4 
A         B                D          
 A1       B1    C1          
 A2       B2               

您希望输出类似于:

的输出
{"col1":"A","col2":"B","col3":"","col4":"D",
indented_entries:[
  {"col1":"A1","col2":"B1","col3":"C1","col4":""},
  {"col1":"A2","col2":"B2","col3":"","col4":""}
]}

虽然你可以遍历固定的字符宽度列,但我的问题是好像有人对实现有一个优雅的想法?

现实生活场景

我有一个带有一些列和一些不规则图案(缩进行)的ASCII表。数据如下:

ty      eq  status       use state  ord   capacity      free    ra  part high low
ma      10  m----3---r-   1% on          558.912G  555.888G    1M    16   80% 70%
 mm      11               1% on       0  558.912G  558.894G  [586042464 inodes]
 mr      12               1% on       1  558.912G  555.888G

我想将这些数据解析成字典,所以它看起来像这样:

{
  "ty": "ma",
  "eq": "10",
  "status": "m----3---r-",
  "use": "1%",
  "state": "on",
  "capacity": "558.912G",
  "free": "555.888G",
  "ra": "1M",
  "part": "16",
  "high": "80%",
  "low": "70%",
  "_ty": [
    {
      "ty": "mm",
      "eq": "11",
      "use": "1%",
      "state": "on",
      "ord": "0",
      "capacity": "558.912G",
      "free": "558.894G [586042464 inodes]"
    },
    {
      "ty": "m4",
      "eq": "12",
      "use": "1%",
      "state": "on",
      "ord": "1",
      "capacity": "558.912G",
      "free": "555.888G"
    }
  ]
}

我想知道在Python(2.6)中执行此操作的优雅方法。有什么想法吗?

还会考虑一种忽视“子行”并仅返回顶级的解决方案。

我的一些尝试:

def panda_conversion(filename):
        import pandas
        return pandas.read_csv(filename,sep='\s+').T.to_dict()


def convert_table_to_dict(filename):
        import csv
        return list(csv.DictReader(open(filename), delimiter=' '))


def convert_ascii_table(filename):
        import asciitable
        return asciitable.read(filename, Reader=asciitable.FixedWidth,
                col_starts=( 0, 8,  12, 25, 29, 37, 41, 51, 61, 69, 73, 78 ),
                col_ends  =( 2, 11, 23, 28, 34, 40, 50, 60, 68, 71, 76, 81 ))

if __name__ == '__main__':
        import sys
        import json
        filename = sys.argv[1:][0]
        print( panda_conversion( filename ) )
        print( json.dumps( convert_table_to_dict( filename ), indent=1))
        print( convert_ascii_table( filename ) )

顺便说一句,根据评论,我没有正确地做到这一点,你不需要发布代码。只是想法就足够了。感谢。

1 个答案:

答案 0 :(得分:0)

这很有效,但它非常不灵活。

def iterate(filename):
        d={}
        f = open(filename,'r')
        header = f.readline().split()
        header_parent = list(header)
        header_parent.remove("ord")
        header_child = [e for e in header if e not in ("status","part","high","low" )]

        for line in f:
                if not line[0] == ' ':
                        for columns in header_parent:
                                d[columns] = line.split()[header_parent.index(columns)]
                else:
                        for columns in header_child:
                                d["_"+columns] = '' if (header_child.index(columns) >= len(line.split()) ) else line.split()[header_child.index(columns)]

        return d

你知道如何改进这个吗?