如何从月度数据中找到月差距计数?

时间:2018-06-26 10:27:19

标签: python pandas dataframe

我有一个如下数据框:

name,date
AAA,201705
AAA,201706
AAA,201707
AAA,201708
AAA,201710
AAA,201711
AAA,201802
AAA,201803
AAA,201804
AAA,201805
AAA,201806
AAA,201807

在此数据框中,有两列可用,即名称和日期。在日期列中,只有年份和月份为yyyymm格式。

日期列中的值 201709、201712和201801 月不可用。

需要检查是否存在所有月份。如果没有月份,则需要采用以下格式的输出:

name,start_date,end_date,count
AAA,201709,201709,1
AAA,201712,201801,2

我正在尝试使用pandas diff function

1 个答案:

答案 0 :(得分:3)

使用asfreq

#convert column to datetimes
df['date'] = pd.to_datetime(df['date'], format='%Y%m')
# get missing values by asfreq
a = df.set_index('date').groupby('name')['name'].apply(lambda x: x.asfreq('MS'))
#filter only NaNs consecutive rows
b = a.notnull().cumsum()[a.isnull()].reset_index(name='g')

#aggregate first, last and count
d = {'date':['first','last'],'name':['first', 'size']}
df = b.groupby('g').agg(d).reset_index(drop=True)
#data cleaning
df.columns = df.columns.map('_'.join)
df = df.rename(columns={'date_first':'start_date', 
                        'date_last':'end_date', 
                        'name_first':'name', 
                        'name_size':'count'})
print (df)
  start_date   end_date name  count
0 2017-09-01 2017-09-01  AAA      1
1 2017-12-01 2018-01-01  AAA      2

详细信息

print (a)
name  date      
AAA   2017-05-01    AAA
      2017-06-01    AAA
      2017-07-01    AAA
      2017-08-01    AAA
      2017-09-01    NaN
      2017-10-01    AAA
      2017-11-01    AAA
      2017-12-01    NaN
      2018-01-01    NaN
      2018-02-01    AAA
      2018-03-01    AAA
      2018-04-01    AAA
      2018-05-01    AAA
      2018-06-01    AAA
      2018-07-01    AAA
Name: name, dtype: object