我设法让以下代码正常工作
import collections
from lxml import etree
## Up here is code for getting an .xml input file from the user, opening that file, etc. ##
## This part is in a for loop that goes over each order in the xml file ##
## This all would have an extra indent because it is under this: for order in root.xpath('//order'): ##
itemlist = []
## This part looks through the .xml file for the order it is currently iterating and puts the items into a list ##
for element in order.iter('items'):
itemlist.append ("%s" % str.upper((element.get('type'))))
## This part 'sanitizes' the order name from the .xml file for use as a key ##
for element in order.iter('order'):
ordername = element.get('name')
strippedordername = re.sub('[/\()!@#$%^&*()]', '', ordername)
allordernames.append (strippedordername)
print strippedordername
#print itemlist
## This bit compiles a shopping list of items in a special dict subclass called a Counter. ##
ordercounter.update(itemlist)
## This part makes a dict with order names for its keys and their corresponding Counter of items as its values ##
ordersdictsdict[strippedordername] = collections.Counter(itemlist)
zeros = dict((k,0) for k in ordercounter.keys())
for cntr in ordersdictsdict.values():
cntr.update(zeros)
#print ordercounter
#print ordersdictsdict
key_order = list(ordercounter.keys())
print key_order
with open(out_file,'w') as fout:
fout.write('Order,'+','.join(key_order)+'\n')
fout.write('Totals,'+','.join(str(ordercounter[k]) for k in key_order)+'\n')
for ordername,dct in ordersdictsdict.items():
fout.write(ordername+','+','.join(str(dct[k]) for k in key_order)+'\n')
fout.closed
输出最终看起来像这样:
Order,Spam,Eggs,Baked Beans,Sausage
Totals,13,1,1,1
Order for Joe,2,1,0,1
Order for Jill,11,0,1,0
我的脚本获取输入xml文件并解析它,查找订单名称然后订购内容。一个xml文件中可以有多个订单。然后我有一个counter来计算所有订单中的所有商品,并给我一个完整的购物清单。
鉴于这两个样本订单:
Order for Joe: Spam, Egg, Sausage, Spam
Order for Jill: Spam, Spam, Spam, Spam, Spam, Spam, Spam, Beaked Beans, Spam, Spam, Spam, Spam
计数器看起来像这样:
Counter({'Spam': 13,'Baked Beans' 1, 'Egg': 1, 'Sausage': 1})
然后我将其写入csv文件,使其看起来像这样:
Item,Count
Spam,13
Baked Bean,1
Egg,1
Sausage,1
虽然总购物清单很好,但我想扩展我的输出csv文件以包含每个订单名称的购物清单。我不关心订单名称是行还是列。我也不太关心不是那个顺序的项目的单元格是0
还是空的,但我会在我的示例中使用0
。
订单名称为行的所需输出示例
Order Name,Spam,Baked Beans,Egg,Sausage
Totals,13,1,1,1
Order for Joe,2,0,1,1
Order for Jill,11,1,0,0
将订单名称作为列的所需输出示例
Item,Totals,Order for Joe,Order for Jill
Spam,13,2,11
Baked Beans,1,0,1
Egg,1,1,0
Sausage,1,1,0
我希望此脚本适用于任何输入文件 - 当然,如果输入只包含一个订单,则Totals
将匹配该订单名称。我必须首先制作一个总计数器(以便我有所讨论的订单的所有可能项目),然后用每个订单的计数填写csv。换句话说,我无法通过将项目写入硬编码来启动我的csv文件,因为下一个输入文件可能在订单中有不同的项目。
答案 0 :(得分:1)
为什么不能为输入文件的每一行使用Counter
?
from collections import Counter
d = {}
#*1* Alternatively, could use : d = defaultdict(Counter)
with open(inputfile) as input_file:
for line in input_file:
for_who, items = line[:-1].split(':',1)
d[for_who] = Counter(items.split(','))
#Alternatively, if using defaultdict at *1*, d.update(items.split(','))
#This allows "joe" to register multiple shopping lists which get summed into 1
#get totals by `sum`ming your Counters values:
totals = sum(d.values())
#Now add a 0-dict to each of the dictionaries just to make sure they have all the keys
zeros = dict((k,0) for k in totals)
for cntr in d.values():
cntr.update(zeros)
key_order = list(totals.keys()) #list for py2k
with open(output_file,'w') as fout:
fout.write('Order '+','.join(key_order)+'\n')
fout.write('Totals,'+','.join(str(totals[k]) for k in key_order)+'\n')
for person,dct in d.items():
fout.write(person+','+','.join(str(dct[k]) for k in key_order)+'\n')
如果您的项目名称中包含逗号(Think csv
模块中的内容),您可能需要更加棘手地处理引用,但这应该为您提供一个好的起点。< / p>
答案 1 :(得分:1)
您可以使用csv.DictWriter来管理输出。
您将为每个订单组装一个长计数器列表,以及一个包含总计的计数器。
当您阅读输入时,请按以下方式处理输入:
答案 2 :(得分:1)
我建议使用嵌套的collections.defaultdict
集初始化为0。
假设您的输入文件如下所示:
Order for Joe: Spam, Egg, Sausage, Spam
Order for Jill: Spam, Spam, Spam, Spam, Spam, Spam, Spam, Beaked Beans, Spam, Spam, Spam, Spam
然后,您可以按如下方式获得总计和单个订单计数:
answer = collections.defaultdict(collections.defaultdict(int))
with open('path/to/input') as infile:
for line in infile:
name, _, orders = line.partition(":")
name = name.rpartition(' ')[-1]
orders = orders.strip().split(',')
for order in orders:
answer['total'][order] += 1
answer[name][order] += 1
with open('path/to/output') as outfile:
keys = sorted(answer['total'])
outfile.write("Order Name,%s" %(','.join(keys)))
outfile.write('total,%s' %(','.join(answer['total'][k] for k in keys)))
for name, orders in answer.iteritems():
if name != 'total':
outfile.write('%s,%s' %(name, ','.join(answer[name][k] for k in keys)))