如何将pandas groupby()对象存储在具有不同索引的同一变量中

时间:2020-10-29 09:11:38

标签: python pandas dataframe pandas-groupby

假设我有一个三列的数据帧df

df=
id  date       value
A  02-04-2000  3
A  03-04-2000  8
B  04-04-2000  12
B  02-04-2000  7
C  03-04-2000  5
C  04-04-2000  2

我有兴趣根据df['id']列对数据进行分组并将值存储在变量new中。 new应该以这样的方式存储值:当我调用new[1]时,它应该返回与id = A相对应的元素,并离开id列,而new [2]应该返回与id = B对应的元素,依此类推。

示例输出:

new[1]=
date       value
02-04-2000  3
03-04-2000  8

new[2]=
date        value
04-04-2000  12
02-04-2000  7

2 个答案:

答案 0 :(得分:1)

对于所有解决方案,请使用DataFrame.groupby,并删除DataFrame.dropid列。

如果可能的话,用0, 1,...进行索引并输出DataFrame的列表:

new = [g.drop('id', axis=1) for _, g in df.groupby('id')]
print (new[0])
         date  value
0  02-04-2000      3
1  03-04-2000      8

如果输出是DataFrame的字典,则创建连续的组:

new = {k: g.drop('id', axis=1) 
                       for k, g in  df.groupby(df['id'].ne(df['id'].shift()).cumsum())}
print (new[1])
         date  value
0  02-04-2000      3
1  03-04-2000      8

相似的解决方案(无连续组):

new1 = {k: g.drop('id', axis=1) for k, g in  df.groupby('id')}
print (new1['A'])
         date  value
0  02-04-2000      3
1  03-04-2000      8

按连续分组的分组,我尝试在其他数据中进行解释:

 print (df)

  id        date  value
0  A  02-04-2000      3
1  A  03-04-2000      8
2  B  04-04-2000     12
3  A  02-04-2000      7
4  A  03-04-2000      5
5  C  04-04-2000      2
    
new = {k: g.drop('id', axis=1) 
                       for k, g in  df.groupby(pd.factorize(df['id'])[0]+1)}


#all A rows is first group
print (new[1])
         date  value
0  02-04-2000      3
1  03-04-2000      8
3  02-04-2000      7
4  03-04-2000      5


#all C rows is third group   
print (new[3])
         date  value
5  04-04-2000      2

按连续组分组:

print (df)

  id        date  value
0  A  02-04-2000      3 <- 1group
1  A  03-04-2000      8 <- 1group
2  B  04-04-2000     12 <- 2group
3  A  02-04-2000      7 <- 3group
4  A  03-04-2000      5 <- 3group
5  C  04-04-2000      2 <- 4group
    

new = {k: g.drop('id', axis=1) 
                       for k, g in  df.groupby(df['id'].ne(df['id'].shift()).cumsum())}

#first group   
print (new[1])
         date  value
0  02-04-2000      3
1  03-04-2000      8

#fourth group
print (new[3])
         date  value
3  02-04-2000      7
4  03-04-2000      5

答案 1 :(得分:0)

# generate new dict
new = {}
# get unique id values
unique_ids = df['id'].unique()

for index, value in zip(range(len(unique_ids)), unique_ids):
    new[index] = df[df['id'] == value].copy()