在Python 2.6中,使用不规则宽度,缺少数据和缩进子条目的列来过滤ASCII表以将它们转换为字典的pythonic方法是什么?
col1 col2 col3 col4
A B D
A1 B1 C1
A2 B2
您希望输出类似于:
的输出{"col1":"A","col2":"B","col3":"","col4":"D",
indented_entries:[
{"col1":"A1","col2":"B1","col3":"C1","col4":""},
{"col1":"A2","col2":"B2","col3":"","col4":""}
]}
虽然你可以遍历固定的字符宽度列,但我的问题是好像有人对实现有一个优雅的想法?
现实生活场景
我有一个带有一些列和一些不规则图案(缩进行)的ASCII表。数据如下:
ty eq status use state ord capacity free ra part high low
ma 10 m----3---r- 1% on 558.912G 555.888G 1M 16 80% 70%
mm 11 1% on 0 558.912G 558.894G [586042464 inodes]
mr 12 1% on 1 558.912G 555.888G
我想将这些数据解析成字典,所以它看起来像这样:
{
"ty": "ma",
"eq": "10",
"status": "m----3---r-",
"use": "1%",
"state": "on",
"capacity": "558.912G",
"free": "555.888G",
"ra": "1M",
"part": "16",
"high": "80%",
"low": "70%",
"_ty": [
{
"ty": "mm",
"eq": "11",
"use": "1%",
"state": "on",
"ord": "0",
"capacity": "558.912G",
"free": "558.894G [586042464 inodes]"
},
{
"ty": "m4",
"eq": "12",
"use": "1%",
"state": "on",
"ord": "1",
"capacity": "558.912G",
"free": "555.888G"
}
]
}
我想知道在Python(2.6)中执行此操作的优雅方法。有什么想法吗?
还会考虑一种忽视“子行”并仅返回顶级的解决方案。
我的一些尝试:
def panda_conversion(filename):
import pandas
return pandas.read_csv(filename,sep='\s+').T.to_dict()
def convert_table_to_dict(filename):
import csv
return list(csv.DictReader(open(filename), delimiter=' '))
def convert_ascii_table(filename):
import asciitable
return asciitable.read(filename, Reader=asciitable.FixedWidth,
col_starts=( 0, 8, 12, 25, 29, 37, 41, 51, 61, 69, 73, 78 ),
col_ends =( 2, 11, 23, 28, 34, 40, 50, 60, 68, 71, 76, 81 ))
if __name__ == '__main__':
import sys
import json
filename = sys.argv[1:][0]
print( panda_conversion( filename ) )
print( json.dumps( convert_table_to_dict( filename ), indent=1))
print( convert_ascii_table( filename ) )
顺便说一句,根据评论,我没有正确地做到这一点,你不需要发布代码。只是想法就足够了。感谢。
答案 0 :(得分:0)
这很有效,但它非常不灵活。
def iterate(filename):
d={}
f = open(filename,'r')
header = f.readline().split()
header_parent = list(header)
header_parent.remove("ord")
header_child = [e for e in header if e not in ("status","part","high","low" )]
for line in f:
if not line[0] == ' ':
for columns in header_parent:
d[columns] = line.split()[header_parent.index(columns)]
else:
for columns in header_child:
d["_"+columns] = '' if (header_child.index(columns) >= len(line.split()) ) else line.split()[header_child.index(columns)]
return d
你知道如何改进这个吗?