我有这样的数据框,我必须丢失季度值并在它们之间进行计数 与季度丢失计数相同,并填充数据框为
year Data Id number
2019Q4 57170 A 1
2019Q3 55150 A 1
2019Q2 51109 A 1
2019Q1 51109 A 1
2018Q1 57170 B 5
2018Q4 55150 B 5
2017Q4 51109 C 7
2017Q2 51109 C 7
2017Q1 51109 C 7
Id开始年份结束年份计数
B 2018Q2 2018Q3 2 5
B 2017Q3 2018Q3 1 5
如何使用python熊猫实现这一目标
答案 0 :(得分:1)
使用:
#convert datetimes to quarter period
df['year'] = pd.to_datetime(df['year']).dt.to_period('Q')
#resample by start of months with asfreq
df1 = (df.set_index('year')
.groupby('Id')['Id']
.resample('Q')
.asfreq()
.rename('val')
.reset_index())
print (df1)
Id year val
0 A 2019Q1 A
1 A 2019Q2 A
2 A 2019Q3 A
3 A 2019Q4 A
4 B 2018Q1 B
5 B 2018Q2 NaN
6 B 2018Q3 NaN
7 B 2018Q4 B
8 C 2017Q1 C
9 C 2017Q2 C
10 C 2017Q3 NaN
11 C 2017Q4 C
m = df1['val'].notnull().rename('g')
#create index by cumulative sum for unique groups for consecutive NaNs
df1.index = m.cumsum()
#filter only NaNs row and aggregate first, last and count.
df2 = (df1[~m.values].groupby(['Id', 'g'])['year']
.agg(['first','last','size'])
.reset_index(level=1, drop=True)
.reset_index())
print (df2)
Id first last size
0 B 2018Q2 2018Q3 2
1 C 2017Q3 2017Q3 1
编辑:
对于具有相同值的新列,请将其添加到groupby
:
#convert datetimes to quarter period
df['year'] = pd.to_datetime(df['year']).dt.to_period('Q')
#resample by start of months with asfreq
df1 = (df.set_index('year')
.groupby(['Id','number'])['Id'] <- added number columns
.resample('Q')
.asfreq()
.rename('val')
.reset_index())
print (df1)
Id number year val
0 A 1 2019Q1 A
1 A 1 2019Q2 A
2 A 1 2019Q3 A
3 A 1 2019Q4 A
4 B 5 2018Q1 B
5 B 5 2018Q2 NaN
6 B 5 2018Q3 NaN
7 B 5 2018Q4 B
8 C 7 2017Q1 C
9 C 7 2017Q2 C
10 C 7 2017Q3 NaN
11 C 7 2017Q4 C
m = df1['val'].notnull().rename('g')
#create index by cumulative sum for unique groups for consecutive NaNs
df1.index = m.cumsum()
#filter only NaNs row and aggregate first, last and count.
df2 = (df1[~m.values].groupby(['Id', 'number', 'g'])['year']
.agg(['first','last','size']) <- added number columns
.reset_index(level=1, drop=True)
.reset_index())
print (df2)
Id g first last size
0 B 5 2018Q2 2018Q3 2
1 C 8 2017Q3 2017Q3 1