我的输入数据是一个OrderedDict,可以有一个可变深度的嵌套OrderedDicts,所以我选择处理递归解析这个输出。所需的输出是带有标题的csv。
在完成所有分支的离开后,在遍历分支时能够正确定义field_name
,我的代码将起作用。 (即Type_1.Field_3.Data
将被错误地称为Type_1.Field_2.Field_3.Data
)。
一旦树枝上的树叶耗尽,我想从.Field_x
移除最后一个field_name
,以便为下一个对象添加一个新的(正确的)。
有谁知道我可以在哪里加入此功能?谢谢,
...
def get_soql_fields(soql):
soql_fields = re.search('(?<=select)(?s)(.*)(?=from)', soql) # get fields
soql_fields = re.sub(' ', '', soql_fields.group()) # remove extra spaces
fields = re.split(',|\n|\r', soql_fields) # split on commas and newlines
fields = [field for field in fields if field != ''] # remove empty strings
return fields
def parse_output(data, soql):
fields = get_soql_fields(soql)
header = fields
master = [header]
for record in data['records']: # for each 'record' in response
row = []
for obj, value in record.iteritems(): # for each obj in record
if isinstance(value, basestring): # if query base object has desired fields
if obj in fields:
row.append(value)
elif isinstance(value, dict): # traverse down into object
path = obj
row.append(_traverse_output(obj, value, fields, row, path))
master.append(row)
return master
def _traverse_output(obj, value, fields, row, path):
for f, v in value.iteritems(): # for each item in obj
if not isinstance(v, (dict, list, tuple)):
field_name = '{path}.{name}'.format(path=path, name=f) # TODO fix this to full field name
print('FName: {0}'.format(field_name))
if field_name in fields:
print('match')
row.append(v)
elif isinstance(v, dict): # it is a dict
path += '.{obj}'.format(obj=f)
_traverse_output(f, v, fields, row, path)
select
Type_1.Field_1,
Type_1.Field_2.Data,
Type_1.Field_3,
Type_1.Field_4,
Type_1.Field_5.Data_1.Data,
Type_1.Field_6,
Type_2.Field_1,
Type_2.Field_2
from
Obj_1
limit
1
;
{
"records": [
{
"attributes": {
"type": "Obj_1",
"url": "<url>"
},
"Type_1": {
"attributes": {
"type": "Type_1",
"url": "<url>"
},
"Field_1": "<stuff>",
"Field_2": {
"attributes": {
"type": "Field_2",
"url": "<url>"
},
"Data": "<data>"
},
"Field_3": "<data>",
"Field_4": "<data>",
"Field_5": {
"attributes": {
"type": "Field_2",
"url": "<url>"
},
"Data_1": {
"attributes": {
"type": "Data_1",
"url": "<url>"
},
"Data": "<data>"
}
},
"Field_6": 1.0
},
"Type_2": {
"attributes": {
"type": "Type_2",
"url": "<url>"
},
"Field_1": "<data>",
"Field_2": "<data>"
}
}
]
}
答案 0 :(得分:0)
我为此制定了快速解决方案。我只是记下我想出的东西,并将我写的代码附加到最后。
基本上你的问题是你一直试图修改path
到位,这是不可行的。而是做像
new_path = path + '.{obj}'.format(obj=f)
_traverse_output(f, v, fields, row, new_path)
关于此的注释:它不一定会产生一个值与标题的顺序相同的行(即,如果Type_1.Field_1位于标题列表的位置0,那么对应的值可能不是)。
解决这个问题的简单方法(并且通常处理csvs)是使用csv module中的DictWriter,然后将空字典传递给第一个调用,其中键将是字段名称,值将是他们的价值观。
解决问题的另一种方法是使用None或空字符串预填充行列表,然后使用list.index
方法将值分配给适当的位置。
我编写了_traverse_output
的实现作为每个的示例,尽管它们与您的代码略有不同。它们采用'records'
列表的元素。
字典示例
def _traverse_output_with_dict(record, fields, row_values, field_name=''):
for obj, value in record.iteritems():
new_field_name = '{}.{}'.format(field_name, obj) if field_name else obj
print new_field_name
if not isinstance(value, dict):
if new_field_name in fields:
row_values[new_field_name] = value
else:
_traverse_output_with_dict(value, fields, row_values, new_field_name)
列出示例
def _traverse_output_with_list(record, fields, row, field_name=''):
while len(row) < len(fields):
row.append('')
for obj, value in record.iteritems():
new_field_name = '{}.{}'.format(field_name, obj) if field_name else obj
print new_field_name
if not isinstance(value, dict):
if new_field_name in fields:
row[fields.index(new_field_name)] = value
else:
_traverse_output_with_list(value, fields, row, new_field_name)