从字典中删除特定的文本和字符

时间:2017-05-30 09:07:55

标签: python python-2.7 dictionary

有一个这样的列表(只有更大)称为' data_list':

2017-04-01, available
2017-04-02, available
2017-04-01, available
2017-04-02, available
2017-04-02, available
2017-04-01, available
2017-04-02, available
2017-04-01, available
2017-04-02, available
2017-04-01, available
etcetera

我使用了以下代码,

dates = collections.defaultdict(list)
for date, xyz in data_list:
    dates[date].append(xyz)
counts = {date: collections.Counter(xyz) for date, xyz in dates.items()}

要创建这样的字典:

2017-04-01,Counter({'available': 9})
2017-04-02,Counter({'available': 12})
2017-04-03,Counter({'available': 9})
2017-04-04,Counter({'available': 4})
2017-04-05,Counter({'available': 9})
2017-04-06,Counter({'available': 2})

我将如何继续删除" Counter"? (最终是像'('和' {')

这样的字符

目前,我有这段代码。但它没有做任何事情。

for x in my_dictionary:
try:
    x = x.replace('Counter','')
except:
    pass

最终的主要目标是获得这样的.csv文件:

date, available
2017-04-01, 9
2017-04-02, 12
2017-04-03, 9
2017-04-04, 4
2017-04-05, 9
2017-04-06, 2  

字典的部分打印输出:

'2018-12-12': Counter({'available': 3}), '2018-04-28': Counter({'available': 4}), '2017-12-16': Counter({'available': 2}), '2017-12-17': Counter({'available': 2}), '2017-12-14': Counter({'available': 2}), '2017-12-15': Counter({'available': 2}), '2017-12-12': Counter({'available': 2}), '2017-12-13': Counter({'available': 2}), '2017-12-10': Counter({'available': 2}), '2017-12-11': Counter({'available': 2}), '2017-12-18': Counter({'available': 2}), '2017-12-19': Counter({'available': 2}), '2018-05-31': Counter({'available': 4}), '2018-05-30': Counter({'available': 4}),

2 个答案:

答案 0 :(得分:1)

在这种情况下,您根本不需要collections.Counter,即使使用collections.defaultdict也可以取消。这样就可以了:

dates = {}
for date, value in data_list:
    if value == "available":
        dates[date] = dates.get(date, 0) + 1
# dates contains (date, count) pairs

也应该快得多。然后,您可以使用csv.writercsv.DictWriter(根据所需的输出CSV)写出最终的CSV。例如:

import csv

data_list = [['2017-04-01', 'available'],
             ['2017-04-02', 'available'],
             ['2017-04-01', 'available'],
             ['2017-04-02', 'available'],
             ['2017-04-02', 'available'],
             ['2017-04-01', 'available'],
             ['2017-04-02', 'available'],
             ['2017-04-01', 'available'],
             ['2017-04-02', 'available'],
             ['2017-04-01', 'available']]

dates = {}
for date, value in data_list:
    if value == "available":
        dates[date] = dates.get(date, 0) + 1

with open("output.csv", "wb") as f:  # open output.csv for writing
    writer = csv.writer(f)  # create a csv.writer
    writer.writerow(("date", "available"))  # write our header
    for row in dates.iteritems():  # sorted(dates.iteritems()) instead for date-sorted output
        writer.writerow(row)  # write the row

为您提供有效的CSV格式:

date,available
2017-04-02,5
2017-04-01,5

您可以在几乎任何电子表格应用中打开它。如果您希望将其格式化为输出,请注意这不是有效的CSV。

更新 - 每个日期都有多个可用值的版本(此时使用collections.Counter变得更方便,但要与主题保持一致):

import csv

data_list = [['2017-04-01', 'available'],
             ['2017-04-02', 'available'],
             ['2017-04-01', 'booked'],
             ['2017-04-02', 'available'],
             ['2017-04-02', 'booked'],
             ['2017-04-01', 'available'],
             ['2017-04-02', 'blocked'],
             ['2017-04-01', 'blocked'],
             ['2017-04-02', 'blocked'],
             ['2017-04-01', 'available']]

dates = {}
values = set()  # just so we know what are possible values for the latter CSV header
for date, value in data_list:
    values.add(value)
    dates.setdefault(date, {})[value] = dates.get(date, {}).get(value, 0) + 1

with open("output.csv", "wb") as f:  # open output.csv for writing
    header = ["date"] + list(values)  # set header to date,<available_values>
    writer = csv.DictWriter(f, header)
    writer.writeheader()
    for k, v in dates.iteritems():  # sorted(dates.iteritems()) instead for date-sorted output
        v.update({"date": k})  # add the date to our row
        writer.writerow(v)  # write the row

output.csv创建为:

date,available,booked,blocked
2017-04-02,2,1,2
2017-04-01,3,1,1

您可以拥有尽可能多的价值&#39;你想要的字段,所以它不必只有3个。

答案 1 :(得分:0)

你快到了。您可以使用available键从计数器获取可用计数,如下所示:

counts = {date: collections.Counter(xyz)['available'] for date, xyz in dates.items()}


import csv

def to_row(date, counter):
     return date, counter['booked'], counter['blocked'], counter['available']

counts = [to_row(date, collections.Counter(xyz)) for date, xyz in dates.items()]

 writer = csv.writer(open('<filename>.csv', 'w'))
 writer.writerows(counts)