如何在使用Python

时间:2015-08-24 18:13:31

标签: python xml csv generator

以下是我正在使用的CSV文件:

`"A","B","C","D","E","F","G","H","I","J"

"88",18,1,"<Req TID=""34"" ReqType=""MS""><IISO /><CID>2</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""3"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-10 12.00.02 AM,27-JUN-15 12.00.00 AM,"26",667,0
"22",22,1,"<Req TID=""45"" ReqType=""MS""><IISO /><CID>4</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-22 12.00.02 AM,27-JUN-22 12.00.00 AM,"26",667,0
"32",22,1,"<Req TID=""15"" ReqType=""MS""><IISO /><CID>45</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-20 12.00.02 AM,27-JUN-34 12.00.00 AM,"26",667,0`

以下功能已注释。简而言之,函数get_clientresponses_two读取上面的CSV,选择列E的数据实例(XML数据)。有两个生成器函数来解析**列E **中的XML数据,以便转换XML标记和他们的文字变成了Python字典。具体来说,flatten_dict()函数返回可重复的(键,值)对序列。可以将其转换为list(flatten_dict(root))对的列表。

到目前为止写的输出是生成一个字典。然后,def allocate_and_write_data_然后获取这些并创建两个不同的集合。一个是使用flatten_dict(中的键更新的集合。这是为了确保XML中的元素标记包含在新编写的CSV中的标题中(及其对应的值)。编写代码是为了维护标头的完整性(无重复),并允许将新元素标签转换为标头(及其值)。此外,已存在的标头和值应足够灵活,以便使用新实例进行更新(同样,也是唯一的)。此外,所有其他行都旨在存储和更新。然后我将标题转换为列表,并确保使用列表推导data

来计算任何缺少的数据实例(使用'')
import csv
from collections import OrderedDict
from xml.etree.ElementTree import ParseError
import collections 
from __future__ import print_function

def get_clientresponses_2(filename = 's.csv'):

    with open(filename, 'rU') as infile:
        reader = csv.DictReader(infile)         # read the file as a dictionary for each row ({header : value})
        data = {}
        for row in reader:
            for header, value in row.items():
                try:
                    data[header].append(value)
                except KeyError:
                    data[header] = [value]

        client_responses = data['E'] #returns a list
        for client_response in client_responses:
            xml_string = (''.join(client_response))
            xml_string = xml_string.replace('&amp;', '')
            try:
                root = ElementTree.XML(xml_string)
                print(root) 
            except ET.ParseError:
                print("catastrophic failure")
                continue

def allocate_and_write_2(get_clientresponses_2_gen):

    with open(filename, 'r') as infile:
        reader = csv.DictReader(infile)         # read the file as a dictionary for each row ({header : value})
        header = set()
        results = []
        #     data = {} # this is not needed for the purpose of this organization
        for row in reader:
            for get_clientresponses_2 in get_clientresponses_2_gen:
                xml_data = get_clientresponses_2()
                row.update(xml_data)        # just for XML data
                results.append(row)         # everything else
                header.update(row.keys())  # can't forget headers

    #     print(row) # returns dictionary of key values pairs (headers : values)
    #     print(results) # returns list wrapper for dictionary
    #     print(headers) #returns set of all headers
        headers_list = list(header)
    #     print(headers_list) #list form of set

        with open('csv_output.csv', 'wt') as f:
            writer = csv.writer(f)
            writer.writerow(headers_list)
            for row in results:
                data = [row.get(x, '') for x in headers_list]
                writer.writerow(data)
    #             writer.writerows(zip(headers_list, data))

输出如下:

C,HPhone,Locator,IISO,E,S,FName,LaName,J,D,MemID,ResponseRequestType,T,Email,I,Ob,G,MemPass,Address,A,PrivateMembers,H,CNum,ResponseT,CID,B,F
1,,,,"<Response T=""3"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""34"" ReqType=""MS""><IISO /><CID>2</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-15 12.00.00 AM,,,88,,26,[REMOVED],10,,18,0-JAN-10 12.00.02 AM
1,,,,"<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""45"" ReqType=""MS""><IISO /><CID>4</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-22 12.00.00 AM,,,22,,26,[REMOVED],10,,22,0-JAN-22 12.00.02 AM
1,,,,"<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""15"" ReqType=""MS""><IISO /><CID>45</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-34 12.00.00 AM,,,32,,26,[REMOVED],10,,22,0-JAN-20 12.00.02 AM

但是,当我尝试在'get_clientresponses_two'内拨打'allocate_and_write'时收到以下错误:

<ipython-input-91-cfd866a1c0b6> in allocate_and_write_2(get_clientresponses_2_gen)
     37         #     data = {} # this is not needed for the purpose of this organization
     38         for row in reader:
---> 39             for get_clientresponses_2 in get_clientresponses_2_gen:
     40                 xml_data = get_clientresponses_2()
     41                 row.update(xml_data)        # just for XML data

TypeError: 'function' object is not iterable

基于我对此论坛上的生成器和其他帖子的理解,我知道这是由于这个问题。我想通过传入第一个函数的输出get_clientresponses_two的输出来迭代生成器输出,同时实现另一个函数。我希望得到指导和反馈,具体如何最好地纠正这个问题。

1 个答案:

答案 0 :(得分:0)

感谢@AnandSKumar的指导:

确实是由于我在生成器函数的上下文中如何使用迭代器构造函数。我用Anand的建议替换了我原来的剧本:

If Not (Asc(e.KeyChar) = 8) Then
    Dim allowedChars As String = "abcdefghijklmnopqrstuvwxyz"
    If Not allowedChars.Contains(e.KeyChar.ToString.ToLower) Then
        e.KeyChar = ChrW(0)
        e.Handled = True
    End If
End If

但是,我还必须通过返回每个XML树的根来修改for xml_data in get_clientresponses_2(): xml_dat = dict(flatten_dict(xml_data)) 并将其传递给get_clientresponses_2

我将它们保留为两个相互排斥的功能,以防止任何副作用。

您只需在

中拨打allocate_and_write()即可
allocate_and_write()

我这样做的原因详见here

以下是两个功能套件:

if __name__ == "__main__":
    main()