Question

data = {'dates': ['2010-01-29', '2011-06-14', '2012-01-18'], 'values': [4, 3, 8]}
df = pd.DataFrame(data)
df.set_index('dates')
df.index = df.index.astype('datetime64[ns]')

如果索引是一个日期的数据框，我将如何添加一个名为“月”的新列，该列是该月所有值的总和，但不是“输入未来”，就像它说的那样只比它的日期前几天加起来。

这就是列的样子。

'Month': [4, 3, 12]

Answer 1

您可以使用pandas TimeGrouper

df.groupby(pd.TimeGrouper('M')).sum()

Answer 2

apply是你的朋友

def sum_from_months_prior(row, df):
    '''returns sum of values in row month, 
    from all dates in df prior to row date'''

    month = pd.to_datetime(row).month

    all_dates_prior = df[df.index <= row]
    same_month = all_dates_prior[all_dates_prior.index.month == month]

    return same_month["values"].sum()

data = {'dates': ['2010-01-29', '2011-06-14', '2012-01-18'], 'values': [4, 3, 8]}
df = pd.DataFrame(data)
df.set_index('dates', inplace = True)
df.index = pd.to_datetime(df.index)
df["dates"] = df.index
df.sort_index(inplace = True)

df["Month"] = df["dates"].apply(lambda row: sum_from_months_prior (row, df))
df.drop("dates", axis = 1, inplace = True)

所需的df：

            values  Month
dates
2010-01-29       4      4
2011-06-14       3      3
2012-01-18       8     12

Answer 3

有几种方法可以做到这一点。第一种方法是使用df.resample(...).sum()重新采样到每月。

您还可以使用df['month'] = df.index.month从索引创建月份列，然后执行groupby操作df.groupby('month').sum() - 哪种方法最好取决于您要对数据执行的操作。

同月的值总和

3 个答案: