无法将.CSV文件解析为dict

时间:2017-08-22 18:31:40

标签: python dictionary

我已经在python中完成了一些简单的.csv解析,但是有一个新的文件结构给我带来了麻烦。输入文件来自转换为.CSV文件的电子表格。以下是输入的示例: Layout

每个布局都可以有很多布局,每个布局可以有很多层。每个图层只有一个图层和名称。

这是我用来解析它的代码。我怀疑它是一个逻辑/流控制问题,因为我之前解析过的东西,不是很深。通过代码跳过第一个标题行。任何帮助表示赞赏!

import csv
import pprint

def import_layouts_schema(layouts_schema_file_name = 'C:\\layouts\\LAYOUT1.csv'):

    class set_template:

    def __init__(self):

        self.set_name    =''
        self.layout_name =''
        self.layer_name  =''
        self.obj_name    =''

     def check_layout(st, row, layouts_schema):

        c=0

        if st.layout_name == '':

            st.layer_name = row[c+2]
            st.obj_name = row[c+3]

            layer = {st.layer_name : st.obj_name}

            layout = {st.layout_name : layer}
            layouts_schema.update({st.set_name : layout})

        else:

            st.layout_name = row[c+1]
            st.layer_name = row[c+2]
            st.obj_name = row[c+3]

            layer = {st.layer_name : st.obj_name}

            layout = {st.layout_name : layer}

            layouts_schema.update({st.set_name : layout})

        return layouts_schema


    def layouts_schema_parsing(obj_list_raw1): #, location_categories, image_schema, set_location):
         #------ init -----------------------------------
         skipfirst = True
         c = 0
         firstrow = True
         layouts_schema = {}
         end_flag = ''
         st = set_template()
         #---------- start parsing here -----------------
         print('Now parsing layouts schema list')
         for row in obj_list_raw1:

             #print ('This is the row: ', row)
             if skipfirst==True:
                 skipfirst=False
                 continue
             if row[c] != '':
                 st.set_name = row[c]
                 st.layout_name = row[c+1]
                 st.layer_name = row[c+2]
                 st.obj_name = row[c+3]

                 print('FOUND A NEW SET.  SET details below:')
                 print('Set name:', st.set_name, 'Layout name:', st.layout_name, 'Layer name:', st.layer_name, 'Object name:', st.obj_name)
                 if firstrow == True:
                     print('First row of layouts import!')
                     layer = {st.layer_name : st.obj_name}
                     layout = {st.layout_name : layer}
                     layouts_schema = {st.set_name : layout}



                firstrow = False

                check_layout(st, row, layouts_schema)

                continue
            elif firstrow == False:
                print('Not the first row of layout import')

                layer = {st.layer_name : st.obj_name}
                layout = {st.layout_name : layer}
                layouts_schema.update({st.set_name : layout})


                check_layout(st, row, layouts_schema)

        return layouts_schema

     #begin subroutine main
     layouts_schema_file_name ='C:\\Users\\jason\\Documents\\RAY\\layout_schemas\\ANIBOT_LAYOUTS_SCHEMA.csv'
     full_path_to_file = layouts_schema_file_name 
     print('============ Importing LAYOUTS schema from: ', full_path_to_file , ' ==============')
     openfile = open(full_path_to_file)
     reader_ob = csv.reader(openfile)
     layout_list_raw1 = list(reader_ob)
     layouts_schema = layouts_schema_parsing(layout_list_raw1)
     print('=========== End of layouts schema import =========')

     return layouts_schema



 layouts_schema = import_layouts_schema()

随意抛弃任何不起作用的部分。我怀疑我的脑袋里有一点点。 for循环或另一个while循环可以做到这一点。最后,我只想将文件解析为具有相同键结构的dict。也就是说,最后一个dict的第一行看起来像是:

{' RESTAURANT':{' RR_FACING1':{' BACKDROP':' restaurant1'}}}

其余的就在那里。最终,我愿意将这个关键结构和字典用于其他目的。只是无法解析!

2 个答案:

答案 0 :(得分:0)

Wouaw,这是很多代码!

也许尝试更简单的事情:

with open('file.csv') as f:
    keys = f.readline().split(';') # assuming ";" is your csv fields separator
    for line in f:
        vals = line.split(';')
        d = dict(zip(keys, vals))
        print(d)

然后要么创建一个更好的数据文件(没有空格),要么让解析器记住以前的值。

答案 1 :(得分:0)

虽然我同意@ AK47认为代码审查网站可能是更好的方法,但我从SO那里得到了很多帮助,我试图回复一点:恕我直言,你是在思考这个问题。请在下面找到一种方法,该方法可以帮助您找到正确的方向,甚至不需要从Excel转换为CSV(我喜欢xlrd模块,它非常易于使用)。如果您已有CSV,只需在process_sheet()功能中交换循环即可。基本上,我只是存储了" SET"的最后一个值。和"布局"如果它们不同(而不是空),我设置新值。希望有所帮助。是的,您应该考虑更好的数据结构(冗余并不总是坏的,如果您可以避免空单元格:-))。

import xlrd

def process_sheet(sheet : xlrd.sheet.Sheet):
    curr_set = ''
    curr_layout = ''
    for rownum in range(1, sheet.nrows):
        row = sheet.row(rownum)
        set_val = row[0].value.strip()
        layout_val = row[1].value.strip()
        if set_val != '' and set_val != curr_set:
            curr_set = set_val
        if layout_val != '' and layout_val != curr_layout:
            curr_layout = layout_val
        result = {curr_set: {curr_layout: {row[2].value: row[3].value}}}
        print(repr(result))


def main():
    # open a workbook (adapt your filename)
    # then get the first sheet (index 0)
    # and call the process function
    wbook = xlrd.open_workbook('/tmp/test.xlsx')
    sheet = wbook.sheet_by_index(0)
    process_sheet(sheet)

if __name__ == '__main__':
    main()