我有一个如下所示的列表:
[
{
"timeline": "2014-10",
"total_prescriptions": 17
},
{
"timeline": "2014-11",
"total_prescriptions": 14
},
{
"timeline": "2014-12",
"total_prescriptions": 8
{
"timeline": "2015-1",
"total_prescriptions": 4
},
{
"timeline": "2015-3",
"total_prescriptions": 10
},
{
"timeline": "2015-4",
"total_prescriptions": 3
}
]
这基本上是Django中原始SQL查询的输出,它每个月计算一次total_prescriptions,并按升序排列数据。但是,MYSQL计数的本质是它不会为空值返回0 .Hence,2月完全被跳过,而不是有一个total_prescriptions等于0的条目。
我计划在python中遍历列表,并为所有缺失的月份手动添加total_prescriptions = 0,以便输出如下所示:
[
{
"timeline": "2014-10",
"total_prescriptions": 17
},
{
"timeline": "2014-11",
"total_prescriptions": 14
},
{
"timeline": "2014-12",
"total_prescriptions": 8
{
"timeline": "2015-1",
"total_prescriptions": 4
},
{
"timeline": "2015-2",
"total_prescriptions": 0
},
{
"timeline": "2015-3",
"total_prescriptions": 10
},
{
"timeline": "2015-4",
"total_prescriptions": 3
}
]
我将如何做到这一点?
答案 0 :(得分:1)
我最终使用Pandas来解决这个问题,因为对于更大的数据集来说,它显然要快得多,并以优雅的方式完成。 这被称为"重新取样"在熊猫;首先将您的时间转换为numpy日期时间并设置为索引:
>>> import pandas as pd
>>> df = pd.DataFrame(L) #where L is my list of dictionaries
>>> df.index=pd.to_datetime(df.timeline,format='%Y-%m')
>>> df
timeline timeline total_prescriptions
2014-10-01 2014-10 17
2014-11-01 2014-11 14
2014-12-01 2014-12 8
2015-01-01 2015-1 4
2015-03-01 2015-3 10
2015-04-01 2015-4 3
然后,您可以使用重新采样(' MS')添加缺少的月份,并使用fillna(0)将空值转换为零:
>>> df = df.resample('MS').fillna(0)
>>> df
timeline total_prescriptions
2014-10-01 17
2014-11-01 14
2014-12-01 8
2015-01-01 4
2015-02-01 0
2015-03-01 10
2015-04-01 3
答案 1 :(得分:0)
我改变了方法。您开始使用的列表是my_list
def getDate(entry):
"""
Given a list entry dict, return a tuple of ints:
(year, month)
"""
date = entry['timeline']
i = date.index('-')
month = int(date[i + 1:])
year = int(date[:4])
return (year, month)
def supplyMissing(year, month, n):
"""
Given a year, month, & number of missing entries (ints),
return a list of entries (dicts)
"""
entries = []
for e in range(n):
if month == 12:
year += 1
month = 1
else:
month += 1
entries.append({'timeline': str(year) + '-' + str(month),
'total_prescriptions': 0})
return entries
# Make a copy of the list to work with:
new_list = list(my_list)
# Track the number of times corrections are made
c_count = 0
# Iterate over the list
for i in range(len(my_list) - 1):
entry = my_list[i]
next_entry = my_list[i + 1]
year, month = getDate(entry)
next_year, next_month = getDate(next_entry)
if ((next_year == year and next_month == month + 1) or
(next_year == year + 1 and next_month == month - 11)):
pass
# If entries are not sequential, determine what to add.
else:
# How many months are missing?
if next_year == year:
missing_months = next_month - month - 1
else:
dif_years = next_year - year
missing_months = 12 * dif_years + next_month - month - 1
# Generate missing entries
missing_entries = supplyMissing(year, month, missing_months)
# Insert missing entries into the temporary list.
for m in range(missing_months):
new_list.insert(i + 1 + m + c_count, missing_entries[m])
c_count += 1
# Finalize the result
my_list = new_list