在词典列表中追加缺失值

时间:2015-08-26 21:09:35

标签: python list dictionary

我有一个如下所示的列表:

[
{
    "timeline": "2014-10", 
    "total_prescriptions": 17
}, 
{
    "timeline": "2014-11", 
    "total_prescriptions": 14
}, 
{
    "timeline": "2014-12", 
    "total_prescriptions": 8
{
    "timeline": "2015-1", 
    "total_prescriptions": 4
}, 
{
    "timeline": "2015-3", 
    "total_prescriptions": 10
}, 
{
    "timeline": "2015-4", 
    "total_prescriptions": 3
} 
]

这基本上是Django中原始SQL查询的输出,它每个月计算一次total_prescriptions,并按升序排列数据。但是,MYSQL计数的本质是它不会为空值返回0 .Hence,2月完全被跳过,而不是有一个total_prescriptions等于0的条目。

我计划在python中遍历列表,并为所有缺失的月份手动添加total_prescriptions = 0,以便输出如下所示:

[
{
    "timeline": "2014-10", 
    "total_prescriptions": 17
}, 
{
    "timeline": "2014-11", 
    "total_prescriptions": 14
}, 
{
    "timeline": "2014-12", 
    "total_prescriptions": 8
{
    "timeline": "2015-1", 
    "total_prescriptions": 4
}, 
{
    "timeline": "2015-2", 
    "total_prescriptions": 0
}, 
{
    "timeline": "2015-3", 
    "total_prescriptions": 10
}, 
{
    "timeline": "2015-4", 
    "total_prescriptions": 3
} 
]

我将如何做到这一点?

2 个答案:

答案 0 :(得分:1)

我最终使用Pandas来解决这个问题,因为对于更大的数据集来说,它显然要快得多,并以优雅的方式完成。 这被称为"重新取样"在熊猫;首先将您的时间转换为numpy日期时间并设置为索引:

>>> import pandas as pd
>>> df = pd.DataFrame(L) #where L is my list of dictionaries
>>> df.index=pd.to_datetime(df.timeline,format='%Y-%m')
>>> df
timeline    timeline            total_prescriptions                            
2014-10-01  2014-10                   17
2014-11-01  2014-11                   14
2014-12-01  2014-12                    8
2015-01-01  2015-1                     4
2015-03-01  2015-3                    10
2015-04-01  2015-4                     3

然后,您可以使用重新采样(' MS')添加缺少的月份,并使用fillna(0)将空值转换为零:

>>> df = df.resample('MS').fillna(0)
>>> df         
timeline                total_prescriptions
2014-10-01                   17
2014-11-01                   14
2014-12-01                    8
2015-01-01                    4
2015-02-01                    0
2015-03-01                   10
2015-04-01                    3

答案 1 :(得分:0)

我改变了方法。您开始使用的列表是my_list

def getDate(entry):
    """
    Given a list entry dict, return a tuple of ints:
    (year, month)
    """
    date = entry['timeline']
    i = date.index('-')
    month = int(date[i + 1:])
    year = int(date[:4])
    return (year, month)

def supplyMissing(year, month, n):
    """
    Given a year, month, & number of missing entries (ints),
    return a list of entries (dicts)
    """
    entries = []
    for e in range(n):
        if month == 12:
            year += 1
            month = 1
        else:
            month += 1
        entries.append({'timeline': str(year) + '-' + str(month),
                        'total_prescriptions': 0})
    return entries

# Make a copy of the list to work with:
new_list = list(my_list)

# Track the number of times corrections are made
c_count = 0

# Iterate over the list
for i in range(len(my_list) - 1):
    entry = my_list[i]
    next_entry = my_list[i + 1]
    year, month = getDate(entry)
    next_year, next_month = getDate(next_entry)

    if ((next_year == year and next_month == month + 1) or
        (next_year == year + 1 and next_month == month - 11)):
        pass
    # If entries are not sequential, determine what to add.
    else:
        # How many months are missing?
        if next_year == year:
            missing_months = next_month - month - 1
        else:
            dif_years = next_year - year
            missing_months = 12 * dif_years + next_month - month - 1

        # Generate missing entries
        missing_entries = supplyMissing(year, month, missing_months)

        # Insert missing entries into the temporary list.
        for m in range(missing_months):
            new_list.insert(i + 1 + m + c_count, missing_entries[m])
        c_count += 1

# Finalize the result
my_list = new_list