如何从pandas groupby对象创建多个数据帧

时间:2016-02-08 16:40:54

标签: python pandas

我尝试在多索引数据帧df上使用groupby创建新的数据帧。级别0是字符串标识符,级别1是日期时间索引。最后,我想确定每个vsl与每个DIV和DIS关联的总时间。这是df的片段:

                            DIV DIS
vsl    BeginTime            
vsl1   2015-08-19 16:40:00  SAD SAJ  
       2015-08-20 03:45:00  SAD SAJ   
       2015-08-20 13:55:00  SAD SAJ
       ...
vsl2   2015-06-11 07:10:00  NWD NWP
       2015-06-11 16:35:00  NWD NWP
       2015-06-12 01:50:00  NWD NWP
       2015-06-12 11:25:00  NWD NWP
       ...
vsl3   2015-06-24 02:40:00  MVD MVN
       2015-06-24 06:50:00  MVD MVN
       2016-01-21 13:05:00  NAD NAN
       2016-01-21 23:35:00  NAD NAN
       ...
[6594 rows x 2 columns]

我已经检查了How to iterate over pandas multiindex dataframe using index并想出了这个,这并不是我想要的:

for vsl, new_df in df.groupby(level=0):
    vsl = new_df

我期待新的数据帧[' vsl1',vsl2',vsl3'],每个都包含groupby数据帧的内容,即对于vsl1:

                            DIV DIS
vsl    BeginTime            
vsl1   2015-08-19 16:40:00  SAD SAJ  
       2015-08-20 03:45:00  SAD SAJ   
       2015-08-20 13:55:00  SAD SAJ
       ...
[411 rows x 2 columns]

如果我打电话给vsl1:

In [102]: vsl1
Traceback (most recent call last):

  File "<ipython-input-102-7a5664be723c>", line 1, in <module>
    vsl1

NameError: name 'vsl1' is not defined

如果我打电话给vsl:

In [103]: vsl
Out[103]:
                            DIV DIS
vsl    BeginTime            
vsl3   2015-06-24 02:40:00  MVD MVN
       2015-06-24 06:50:00  MVD MVN
       2016-01-21 13:05:00  NAD NAN
       2016-01-21 23:35:00  NAD NAN
       ...
[412 rows x 2 columns]

我尝试打印,如参考文章中所示,作为测试:

In [104]: for vsl, new_df in df.groupby(level=0):
     ...:    print(new_df)
     ...:
Out[104]:
                            DIV DIS
vsl    BeginTime            
vsl1   2015-08-19 16:40:00  SAD SAJ  
       2015-08-20 03:45:00  SAD SAJ   
       2015-08-20 13:55:00  SAD SAJ
       ...
[411 rows x 2 columns]
                            DIV DIS
vsl    BeginTime            
vsl2   2015-06-11 07:10:00  NWD NWP
       2015-06-11 16:35:00  NWD NWP
       2015-06-12 01:50:00  NWD NWP
       2015-06-12 11:25:00  NWD NWP
       ...
[410 rows x 2 columns]
                            DIV DIS
vsl    BeginTime            
vsl3   2015-06-24 02:40:00  MVD MVN
       2015-06-24 06:50:00  MVD MVN
       2016-01-21 13:05:00  NAD NAN
       2016-01-21 23:35:00  NAD NAN
       ...
[412 rows x 2 columns]

我缺少什么,以及如何为0级中包含的每个vsl创建新的数据帧?

0 个答案:

没有答案