Question

任何人都可以帮我解决如何使用日期迭代字典，我有这样的数据集

data=[{u'a': u'D', u'b': 100.0, u'c': 201L, u'd': datetime.datetime(2007, 12, 29, 0, 0), u'e': datetime.datetime(2008, 1, 1, 6, 27, 41)},
      {u'a': u'W', u'b': 100.0, u'c': 201L, u'd': datetime.datetime(2007, 12, 29, 0, 0), u'e': datetime.datetime(2008, 2, 4, 6, 27, 41)},
      {u'a': u'W', u'b': 100.0, u'c': 202L, u'd': datetime.datetime(2007, 12, 30, 0, 0), u'e': datetime.datetime(2008, 1, 1, 4, 20, 44)},
      {u'a': u'D', u'b': 100.0, u'c': 202L, u'd': datetime.datetime(2007, 12, 30, 0, 0), u'e': datetime.datetime(2008, 3, 11, 6, 27, 41)},
      {u'a': u'D', u'b': 100.0, u'c': 202L, u'd': datetime.datetime(2007, 12, 30, 0, 0), u'e': datetime.datetime(2008, 5, 8, 11, 2, 41)},
      {u'a': u'D', u'b': 100.0, u'c': 203L, u'd': datetime.datetime(2008, 1, 2, 0, 0), u'e': datetime.datetime(2008, 6, 1, 6, 27, 41)},
      {u'a': u'W', u'b': 100.0, u'c': 204L, u'd': datetime.datetime(2008, 2, 9, 0, 0), u'e': datetime.datetime(2008, 4, 21, 12, 30, 51)},
      {u'a': u'D', u'b': 100.0, u'c': 204L, u'd': datetime.datetime(2008, 2, 9, 0, 0), u'e': datetime.datetime(2008, 8, 15, 15, 45, 10)}]

如何将其带入以下格式的词典

res={u'201L':(1,0,1),(2,1,0),(3,0,0),(4,0,0).. so on till (12,0,0),
u'202L':(1,1,0),(2,0,0),(3,0,1),(4,0,0),(5,0,1)...(12,0,0),
u'203L':(1,0,0),(2,0,0),(3,0,0),(4,0,0),(5,1,0)...(12,0,0),
u'204L':(1,0,0),(2,0,0),(3,0,0),(4,1,0),(5,0,0),(6,0,0,(7,0,0),(8,0,1)...(12,0,0)}

其中1,2,3是卡片发行日期的第一个，第二个月等等 201L发布日期为datetime.datetime(2007, 12, 29, 0, 0)，202L为datetime.datetime(2007, 12, 30, 0, 0)

第一个月意味着从2007-12-29到2008-1-29

  (1,0,1)---where 1 is the first month
  0 is no of times W
  1 is no of times D

我试过这样的事情

data_dict=defaultdict(Counter)
date_dic={}
for x in data:
  a,b,c,d=x['a'],x['c'],x['d'],x['e']
  data_dict[b][a] += 1
for key , value in data_dict.items():
   date_dic[key] = tuple(map(datetime.date.isoformat, (c,d)))
   for value in range(1,30):
      if value not x: continue

我一直被困在if循环后我可以添加以上格式。我最终得到这样的东西作为我的输出，

defaultdict(<class 'collections.Counter'>, {201L: Counter({u'D': 1, u'W': 1}), 202L: Counter({u'D': 2, u'W': 1}), 203L: Counter({u'D': 1}), 204L: Counter({u'D': 1, u'W': 1})})

Answer 1

我会创建一个日期列表，然后找到“存储桶”以将每个项目放入该列表中。

您可以使用datetime.timedelta() objects创建相对于起点的新日期：

startdate = data[0]['d']
buckets = [startdate + datetime.timedelta(days=30) * i for i in xrange(12)]

现在您有12个日期可以比较其他所有内容，因此您知道将每个后续值放入哪个存储区：

>>> buckets
[datetime.datetime(2007, 12, 29, 0, 0), datetime.datetime(2008, 1, 28, 0, 0), datetime.datetime(2008, 2, 27, 0, 0), datetime.datetime(2008, 3, 28, 0, 0), datetime.datetime(2008, 4, 27, 0, 0), datetime.datetime(2008, 5, 27, 0, 0), datetime.datetime(2008, 6, 26, 0, 0), datetime.datetime(2008, 7, 26, 0, 0), datetime.datetime(2008, 8, 25, 0, 0), datetime.datetime(2008, 9, 24, 0, 0), datetime.datetime(2008, 10, 24, 0, 0), datetime.datetime(2008, 11, 23, 0, 0)]

然后我们可以使用bisect module找到匹配的存储桶：

from bisect import bisect

bisect(buckets, somedate) - 1  # Returns a value from 0 - 11

我们为每个用户创建了这样的存储桶，因此我们需要在单独的映射中跟踪存储桶。我们实际上会根据需要动态创建存储桶以适应当前的交易日期。

接下来，我们使用collections.defaultdict instance跟踪您的输入中的每个关键标记（键c）：

from collections import defaultdict res = defaultdict(list) empty_counts = {'D': 0, 'W': 0}

这会为您的存储桶创建一个列表，并为存款和取款创建一个空的计数字典。我在这里使用了一个字典因为很多比以后操作（不可变）元组更容易使用。我也没有包括月份数（1 - 12）;没有意义，你已经有了每个桶的索引（0 - 11），你可以拥有可变数量的桶。

我们需要根据需要创建存储桶和计数器以适应当前日期;而不是扫描数据以找到每个用户的最大交易日期，我们只需根据需要扩展我们的桶和计数列表：

def expand_buckets(buckets, bucket_counts, start, transaction): # This function modifies the buckets and bucket_counts lists in-place if not buckets: # initialize the lists buckets.append(start) bucket_counts.append(dict(empty_counts)) # keep adding 30-day spans until we can fit the transaction date while buckets[-1] + datetime.timedelta(days=30) < transaction: buckets.append(buckets[-1] + datetime.timedelta(days=30)) bucket_counts.append(dict(empty_counts))

现在我们可以开始计算：

per_user_buckets = defaultdict(list) for entry in data: user = entry['c'] type = entry['a'] transaction_date = entry['e'] buckets = per_user_buckets[user] bucket_counts = res[user] expand_buckets(buckets, bucket_counts, entry['d'], transaction_date) # count transaction date entries per bucket bucket = bisect(buckets, transaction_date) - 1 bucket_counts[bucket][type] += 1

bisect调用可以轻松快捷地选择合适的存储桶。

示例输入的结果是：

>>> pprint(dict(res)) {201L: [{'D': 1, 'W': 0}, {'D': 0, 'W': 1}], 202L: [{'D': 0, 'W': 1}, {'D': 0, 'W': 0}, {'D': 1, 'W': 0}, {'D': 0, 'W': 0}, {'D': 1, 'W': 0}], 203L: [{'D': 0, 'W': 0}, {'D': 0, 'W': 0}, {'D': 0, 'W': 0}, {'D': 0, 'W': 0}, {'D': 0, 'W': 0}, {'D': 1, 'W': 0}], 204L: [{'D': 0, 'W': 0}, {'D': 0, 'W': 0}, {'D': 0, 'W': 1}, {'D': 0, 'W': 0}, {'D': 0, 'W': 0}, {'D': 0, 'W': 0}, {'D': 1, 'W': 0}]}

使用日期迭代字典

1 个答案: