以下是我正在使用的CSV文件:
`"A","B","C","D","E","F","G","H","I","J"
"88",18,1,"<Req TID=""34"" ReqType=""MS""><IISO /><CID>2</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""3"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-10 12.00.02 AM,27-JUN-15 12.00.00 AM,"26",667,0
"22",22,1,"<Req TID=""45"" ReqType=""MS""><IISO /><CID>4</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-22 12.00.02 AM,27-JUN-22 12.00.00 AM,"26",667,0
"32",22,1,"<Req TID=""15"" ReqType=""MS""><IISO /><CID>45</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>","<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",0-JAN-20 12.00.02 AM,27-JUN-34 12.00.00 AM,"26",667,0`
以下功能已注释。简而言之,函数get_clientresponses_two
读取上面的CSV,选择列E的数据实例(XML数据)。有两个生成器函数来解析**列E **中的XML数据,以便转换XML标记和他们的文字变成了Python字典。具体来说,flatten_dict()
函数返回可重复的(键,值)对序列。可以将其转换为list(flatten_dict(root))
对的列表。
到目前为止写的输出是生成一个字典。然后,def allocate_and_write_data_
然后获取这些并创建两个不同的集合。一个是使用flatten_dict(
中的键更新的集合。这是为了确保XML中的元素标记包含在新编写的CSV中的标题中(及其对应的值)。编写代码是为了维护标头的完整性(无重复),并允许将新元素标签转换为标头(及其值)。此外,已存在的标头和值应足够灵活,以便使用新实例进行更新(同样,也是唯一的)。此外,所有其他行都旨在存储和更新。然后我将标题转换为列表,并确保使用列表推导data
import csv
from collections import OrderedDict
from xml.etree.ElementTree import ParseError
import collections
from __future__ import print_function
def get_clientresponses_2(filename = 's.csv'):
with open(filename, 'rU') as infile:
reader = csv.DictReader(infile) # read the file as a dictionary for each row ({header : value})
data = {}
for row in reader:
for header, value in row.items():
try:
data[header].append(value)
except KeyError:
data[header] = [value]
client_responses = data['E'] #returns a list
for client_response in client_responses:
xml_string = (''.join(client_response))
xml_string = xml_string.replace('&', '')
try:
root = ElementTree.XML(xml_string)
print(root)
except ET.ParseError:
print("catastrophic failure")
continue
def allocate_and_write_2(get_clientresponses_2_gen):
with open(filename, 'r') as infile:
reader = csv.DictReader(infile) # read the file as a dictionary for each row ({header : value})
header = set()
results = []
# data = {} # this is not needed for the purpose of this organization
for row in reader:
for get_clientresponses_2 in get_clientresponses_2_gen:
xml_data = get_clientresponses_2()
row.update(xml_data) # just for XML data
results.append(row) # everything else
header.update(row.keys()) # can't forget headers
# print(row) # returns dictionary of key values pairs (headers : values)
# print(results) # returns list wrapper for dictionary
# print(headers) #returns set of all headers
headers_list = list(header)
# print(headers_list) #list form of set
with open('csv_output.csv', 'wt') as f:
writer = csv.writer(f)
writer.writerow(headers_list)
for row in results:
data = [row.get(x, '') for x in headers_list]
writer.writerow(data)
# writer.writerows(zip(headers_list, data))
输出如下:
C,HPhone,Locator,IISO,E,S,FName,LaName,J,D,MemID,ResponseRequestType,T,Email,I,Ob,G,MemPass,Address,A,PrivateMembers,H,CNum,ResponseT,CID,B,F
1,,,,"<Response T=""3"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""34"" ReqType=""MS""><IISO /><CID>2</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-15 12.00.00 AM,,,88,,26,[REMOVED],10,,18,0-JAN-10 12.00.02 AM
1,,,,"<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""45"" ReqType=""MS""><IISO /><CID>4</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-22 12.00.00 AM,,,22,,26,[REMOVED],10,,22,0-JAN-22 12.00.02 AM
1,,,,"<Response T=""10"" RequestType=""MS""><MS><Memb><PrivateMembers /><Ob>0-12-af</Ob><Locator /></Memb><S>[REMOVED]</S><CNum>[REMOVED]</CNum><FName /><LaName /><Address /><HPhone /><Email /><IISO /><MemID /><MemPass /><T /><CID /><T /></MS></Response>",[REMOVED],,,0,"<Req TID=""15"" ReqType=""MS""><IISO /><CID>45</CID><MemID>0000</MemID><MemPass /><RequestData><S>[REMOVED]</S><Na /><La /><Card>[REMOVED]</Card><Address /><HPhone /><Mail /></ReqData></Req>",,MS,,,667,0-12-af,27-JUN-34 12.00.00 AM,,,32,,26,[REMOVED],10,,22,0-JAN-20 12.00.02 AM
但是,当我尝试在'get_clientresponses_two'
内拨打'allocate_and_write'
时收到以下错误:
<ipython-input-91-cfd866a1c0b6> in allocate_and_write_2(get_clientresponses_2_gen)
37 # data = {} # this is not needed for the purpose of this organization
38 for row in reader:
---> 39 for get_clientresponses_2 in get_clientresponses_2_gen:
40 xml_data = get_clientresponses_2()
41 row.update(xml_data) # just for XML data
TypeError: 'function' object is not iterable
基于我对此论坛上的生成器和其他帖子的理解,我知道这是由于这个问题。我想通过传入第一个函数的输出get_clientresponses_two
的输出来迭代生成器输出,同时实现另一个函数。我希望得到指导和反馈,具体如何最好地纠正这个问题。
答案 0 :(得分:0)
感谢@AnandSKumar的指导:
确实是由于我在生成器函数的上下文中如何使用迭代器构造函数。我用Anand的建议替换了我原来的剧本:
If Not (Asc(e.KeyChar) = 8) Then
Dim allowedChars As String = "abcdefghijklmnopqrstuvwxyz"
If Not allowedChars.Contains(e.KeyChar.ToString.ToLower) Then
e.KeyChar = ChrW(0)
e.Handled = True
End If
End If
但是,我还必须通过返回每个XML树的根来修改for xml_data in get_clientresponses_2():
xml_dat = dict(flatten_dict(xml_data))
并将其传递给get_clientresponses_2
我将它们保留为两个相互排斥的功能,以防止任何副作用。
您只需在
中拨打allocate_and_write()
即可
allocate_and_write()
我这样做的原因详见here
以下是两个功能套件:
if __name__ == "__main__":
main()