下采样熊猫数据帧 - '缩减样本'列和行?

时间:2017-08-23 07:51:17

标签: python pandas downsampling

我很遗憾无法分享问题的工作示例,因为我不知道是什么导致了这个问题。但是,我已经汇总了显示我的DataFrame结构的虚拟代码以及我正在尝试的下采样:

示例代码:

department=[]
team=[]
role=[]

#Department 1 components
department1_A_ROLE1= pd.Series(abs(np.random.randn(5)), index=pd.date_range('01-26-2018',periods=5,freq='B'))
department.append('Department 1')
team.append('A')
role.append('ROLE1')
department1_A_ROLE2= pd.Series(abs(np.random.randn(4)), index=pd.date_range('01-26-2018',periods=4,freq='B'))
department.append('Department 1')
team.append('A')
role.append('ROLE2')


#Department 2 components
department2_B_ROLE1= pd.Series(abs(np.random.randn(7)), index=pd.date_range('01-28-2018',periods=7,freq='B'))
department.append('Department 2')
team.append('B')
role.append('ROLE1')
department2_C_ROLE1= pd.Series(abs(np.random.randn(2)),  index=pd.date_range('02-02-2018',periods=2,freq='B'))
department.append('Department 2')
team.append('C')
role.append('ROLE1')


#Department 3 component
department3_B_ROLE2 = pd.Series(abs(np.random.randn(4)), index=pd.date_range('01-31-2018',periods=4,freq='B'))
department.append('Department 3')
team.append('B')
role.append('ROLE2')



#----Generate multi index columns
arrays=[department, team, role]
tuples = list(zip(*arrays))

df=pd.concat([department1_A_ROLE1, department1_A_ROLE2, department2_B_ROLE1, department2_C_ROLE1, department3_B_ROLE2], axis=1)
dateseries=df.index

index = pd.MultiIndex.from_tuples(tuples, names=['Department', 'Team', 'Resource'])

df.columns=index

My DataFrame structure

我的实际DataFrame有.shape(18051,17)。

重采样:

从这开始,我每个月都会尝试使用以下代码.resample

dfByMonth = df.resample('M').sum()

虚拟数据按预期工作:

My DataFrame by month

我的实际DataFrame返回.shape(593, 3 )。

注意:

  • 返回的三列似乎总是相同的三个(来自同一个department
  • 返回列不是按字母顺序排序的第一个或最后一个
  • 删除多索引(df.columns = ' '.join(col).strip() for col in df.columns.values])无效

更新了JoeCondron的评论:

运行[df.iloc[:,i].apply(type).value_counts() for i in range(df.shape[1])]会在下面显示 - &#34;部门4&#34;是我在.resample()中返回的三列...我发现它们是float列中唯一没有<class 'decimal.Decimal'>后面的列 - 这看起来像吸烟枪,但我不明白他们之间的差异...我会想到既是数字又可以重新采样? (注意:这是通过Django响应)

<class 'float'> 1571 
<class 'decimal.Decimal'> 30 Name: (Department 1, A, ROLE1), dtype: int64, 
<class 'float'> 1571 <class 'decimal.Decimal'> 30 Name: (Department 1, A ROLE2), dtype: int64, 
<class 'float'> 1571 <class 'decimal.Decimal'> 30 Name: (Department 1, A ROLE3), dtype: int64, 
<class 'float'> 1307 <class 'decimal.Decimal'> 294 Name: (Department 2, A ROLE1), dtype: int64, 
<class 'float'> 1307 <class 'decimal.Decimal'> 294 Name: (Department 2, A ROLE2), dtype: int64, 
<class 'float'> 1307 <class 'decimal.Decimal'> 294 Name: (Department 2, A ROLE3), dtype: int64, 
<class 'decimal.Decimal'> 1281 <class 'float'> 320 Name: (Department 3, A ROLE1), dtype: int64, 
<class 'decimal.Decimal'> 1281 <class 'float'> 320 Name: (Department 3, A ROLE2), dtype: int64, 
<class 'decimal.Decimal'> 1281 <class 'float'> 320 Name: (Department 3, A ROLE3), dtype: int64, 
<class 'float'> 1601 Name: (Department 4, A ROLE1), dtype: int64, 
<class 'float'> 1601 Name: (Department 4, A ROLE2), dtype: int64, 
<class 'float'> 1601 Name: (Department 4, A ROLE3), dtype: int64, 
<class 'decimal.Decimal'> 1601 Name: (Department 5, A ROLE1), dtype: int64, 
<class 'float'> 1361 <class 'decimal.Decimal'> 240 Name: (Department 5, A ROLE2), dtype: int64, 
<class 'decimal.Decimal'> 1601 Name: (Department 5, A ROLE3), dtype: int64, 
<class 'decimal.Decimal'> 1601 Name: (Department 6, A ROLE1), dtype: int64, 
<class 'decimal.Decimal'> 1601 Name: (Department 6, A ROLE2), dtype: int64]
<class 'decimal.Decimal'> 1601 Name: (Department 6, A ROLE3), dtype: int64, 
<class 'decimal.Decimal'> 1601 Name: (Department 7, A ROLE1), dtype: int64]

0 个答案:

没有答案