比较两个Dicts为库存管理制作CSV

时间:2012-12-13 15:38:15

标签: python csv

更新:解决方案

我设法让以下代码正常工作

import collections
from lxml import etree
## Up here is code for getting an .xml input file from the user, opening that file, etc. ##
## This part is in a for loop that goes over each order in the xml file ##
## This all would have an extra indent because it is under this: for order in root.xpath('//order'): ##
itemlist = []
    ## This part looks through the .xml file for the order it is currently iterating and puts the items into a list ##
    for element in order.iter('items'):
        itemlist.append ("%s" % str.upper((element.get('type'))))
    ## This part 'sanitizes' the order name from the .xml file for use as a key ##
    for element in order.iter('order'):
        ordername = element.get('name')
        strippedordername = re.sub('[/\()!@#$%^&*()]', '', ordername)
        allordernames.append (strippedordername)
        print strippedordername
        #print itemlist
        ## This bit compiles a shopping list of items in a special dict subclass called a Counter. ##
        ordercounter.update(itemlist)
        ## This part makes a dict with order names for its keys and their corresponding Counter of items as its values ##
        ordersdictsdict[strippedordername] = collections.Counter(itemlist)
zeros = dict((k,0) for k in ordercounter.keys())
for cntr in ordersdictsdict.values():
    cntr.update(zeros)

#print ordercounter
#print ordersdictsdict
key_order = list(ordercounter.keys())
print key_order
with open(out_file,'w') as fout:
    fout.write('Order,'+','.join(key_order)+'\n')
    fout.write('Totals,'+','.join(str(ordercounter[k]) for k in key_order)+'\n') 
    for ordername,dct in ordersdictsdict.items():
        fout.write(ordername+','+','.join(str(dct[k]) for k in key_order)+'\n')
fout.closed

输出最终看起来像这样:

Order,Spam,Eggs,Baked Beans,Sausage
Totals,13,1,1,1
Order for Joe,2,1,0,1
Order for Jill,11,0,1,0

我有什么

我的脚本获取输入xml文件并解析它,查找订单名称然后订购内容。一个xml文件中可以有多个订单。然后我有一个counter来计算所有订单中的所有商品,并给我一个完整的购物清单。

鉴于这两个样本订单:

Order for Joe: Spam, Egg, Sausage, Spam
Order for Jill: Spam, Spam, Spam, Spam, Spam, Spam, Spam, Beaked Beans, Spam, Spam, Spam, Spam

计数器看起来像这样: Counter({'Spam': 13,'Baked Beans' 1, 'Egg': 1, 'Sausage': 1})

然后我将其写入csv文件,使其看起来像这样:

Item,Count
Spam,13
Baked Bean,1
Egg,1
Sausage,1

我想要什么

虽然总购物清单很好,但我想扩展我的输出csv文件以包含每个订单名称的购物清单。我不关心订单名称是行还是列。我也不太关心不是那个顺序的项目的单元格是0还是空的,但我会在我的示例中使用0

订单名称为行的所需输出示例

Order Name,Spam,Baked Beans,Egg,Sausage
Totals,13,1,1,1
Order for Joe,2,0,1,1
Order for Jill,11,1,0,0

将订单名称作为列的所需输出示例

Item,Totals,Order for Joe,Order for Jill
Spam,13,2,11
Baked Beans,1,0,1
Egg,1,1,0
Sausage,1,1,0

备注

我希望此脚本适用于任何输入文件 - 当然,如果输入只包含一个订单,则Totals将匹配该订单名称。我必须首先制作一个总计数器(以便我有所讨论的订单的所有可能项目),然后用每个订单的计数填写csv。换句话说,我无法通过将项目写入硬编码来启动我的csv文件,因为下一个输入文件可能在订单中有不同的项目。

3 个答案:

答案 0 :(得分:1)

为什么不能为输入文件的每一行使用Counter

from collections import Counter
d = {}  
#*1* Alternatively, could use : d = defaultdict(Counter)
with open(inputfile) as input_file:
    for line in input_file:
        for_who, items = line[:-1].split(':',1)
        d[for_who] = Counter(items.split(','))  
        #Alternatively, if using defaultdict at *1*, d.update(items.split(','))
        #This allows "joe" to register multiple shopping lists which get summed into 1

#get totals by `sum`ming your Counters values:
totals = sum(d.values())

#Now add a 0-dict to each of the dictionaries just to make sure they have all the keys
zeros = dict((k,0) for k in totals)
for cntr in d.values():
    cntr.update(zeros)

key_order = list(totals.keys())  #list for py2k
with open(output_file,'w') as fout:
    fout.write('Order '+','.join(key_order)+'\n')
    fout.write('Totals,'+','.join(str(totals[k]) for k in key_order)+'\n') 
    for person,dct in d.items():
        fout.write(person+','+','.join(str(dct[k]) for k in key_order)+'\n') 

如果您的项目名称中包含逗号(Think csv模块中的内容),您可能需要更加棘手地处理引用,但这应该为您提供一个好的起点。< / p>

答案 1 :(得分:1)

您可以使用csv.DictWriter来管理输出。

您将为每个订单组装一个长计数器列表,以及一个包含总计的计数器。

当您阅读输入时,请按以下方式处理输入:

  1. 使用.update
  2. 将订单中的每个项目添加到“总计”字典中
  3. 通过创建新的
  4. ,将订单中的每个项目添加到“订单”词典中
  5. 为每个计数器添加“订单名称”键,订单名称为
  6. 创建DictWriter实例,字段名为totals.keys()

答案 2 :(得分:1)

我建议使用嵌套的collections.defaultdict集初始化为0。

假设您的输入文件如下所示:

Order for Joe: Spam, Egg, Sausage, Spam
Order for Jill: Spam, Spam, Spam, Spam, Spam, Spam, Spam, Beaked Beans, Spam, Spam, Spam, Spam

然后,您可以按如下方式获得总计和单个订单计数:

answer = collections.defaultdict(collections.defaultdict(int))
with open('path/to/input') as infile:
    for line in infile:
        name, _, orders = line.partition(":")
        name = name.rpartition(' ')[-1]
        orders = orders.strip().split(',')
        for order in orders:
            answer['total'][order] += 1
            answer[name][order] += 1
with open('path/to/output') as outfile:
    keys = sorted(answer['total'])
    outfile.write("Order Name,%s" %(','.join(keys)))
    outfile.write('total,%s' %(','.join(answer['total'][k] for k in keys)))
    for name, orders in answer.iteritems():
        if name != 'total':
            outfile.write('%s,%s' %(name, ','.join(answer[name][k] for k in keys)))