如何计算季度差异并在python中添加带有计数的缺失季度

时间:2018-07-17 09:25:00

标签: python pandas

我有这样的数据框,我必须丢失季度值并在它们之间进行计数 与季度丢失计数相同,并填充数据框为

year    Data    Id       number
    2019Q4   57170   A     1
    2019Q3   55150   A     1
    2019Q2   51109   A     1
    2019Q1   51109   A     1
    2018Q1   57170   B     5
    2018Q4   55150   B     5
    2017Q4   51109   C     7
    2017Q2   51109   C     7
    2017Q1   51109   C     7 

Id开始年份结束年份计数

  B  2018Q2    2018Q3        2  5
  B  2017Q3    2018Q3        1  5

如何使用python熊猫实现这一目标

1 个答案:

答案 0 :(得分:1)

使用:

#convert datetimes to quarter period
df['year'] = pd.to_datetime(df['year']).dt.to_period('Q')
#resample by start of months with asfreq
df1 = (df.set_index('year')
         .groupby('Id')['Id']
         .resample('Q')
         .asfreq()
         .rename('val')
         .reset_index())
print (df1)
   Id   year  val
0   A 2019Q1    A
1   A 2019Q2    A
2   A 2019Q3    A
3   A 2019Q4    A
4   B 2018Q1    B
5   B 2018Q2  NaN
6   B 2018Q3  NaN
7   B 2018Q4    B
8   C 2017Q1    C
9   C 2017Q2    C
10  C 2017Q3  NaN
11  C 2017Q4    C

m = df1['val'].notnull().rename('g')
#create index by cumulative sum for unique groups for consecutive NaNs
df1.index = m.cumsum()

#filter only NaNs row and aggregate first, last and count.
df2 = (df1[~m.values].groupby(['Id', 'g'])['year']
                     .agg(['first','last','size'])
                     .reset_index(level=1, drop=True)
                     .reset_index())
print (df2)
  Id  first   last  size
0  B 2018Q2 2018Q3     2
1  C 2017Q3 2017Q3     1

编辑:

对于具有相同值的新列,请将其添加到groupby

#convert datetimes to quarter period
df['year'] = pd.to_datetime(df['year']).dt.to_period('Q')
#resample by start of months with asfreq
df1 = (df.set_index('year')
         .groupby(['Id','number'])['Id'] <- added number columns
         .resample('Q')
         .asfreq()
         .rename('val')
         .reset_index())
print (df1)
   Id  number   year  val
0   A       1 2019Q1    A
1   A       1 2019Q2    A
2   A       1 2019Q3    A
3   A       1 2019Q4    A
4   B       5 2018Q1    B
5   B       5 2018Q2  NaN
6   B       5 2018Q3  NaN
7   B       5 2018Q4    B
8   C       7 2017Q1    C
9   C       7 2017Q2    C
10  C       7 2017Q3  NaN
11  C       7 2017Q4    C

m = df1['val'].notnull().rename('g')
#create index by cumulative sum for unique groups for consecutive NaNs
df1.index = m.cumsum()

#filter only NaNs row and aggregate first, last and count.
df2 = (df1[~m.values].groupby(['Id', 'number', 'g'])['year']
                     .agg(['first','last','size'])  <- added number columns
                     .reset_index(level=1, drop=True)
                     .reset_index())
print (df2)
  Id  g  first   last  size
0  B  5 2018Q2 2018Q3     2
1  C  8 2017Q3 2017Q3     1