假设我有一个三列的数据帧df
df=
id date value
A 02-04-2000 3
A 03-04-2000 8
B 04-04-2000 12
B 02-04-2000 7
C 03-04-2000 5
C 04-04-2000 2
我有兴趣根据df['id']
列对数据进行分组并将值存储在变量new
中。 new
应该以这样的方式存储值:当我调用new[1]
时,它应该返回与id = A
相对应的元素,并离开id
列,而new [2]应该返回与id = B
对应的元素,依此类推。
示例输出:
new[1]=
date value
02-04-2000 3
03-04-2000 8
new[2]=
date value
04-04-2000 12
02-04-2000 7
答案 0 :(得分:1)
对于所有解决方案,请使用DataFrame.groupby
,并删除DataFrame.drop
的id
列。
如果可能的话,用0, 1,...
进行索引并输出DataFrame
的列表:
new = [g.drop('id', axis=1) for _, g in df.groupby('id')]
print (new[0])
date value
0 02-04-2000 3
1 03-04-2000 8
如果输出是DataFrame
的字典,则创建连续的组:
new = {k: g.drop('id', axis=1)
for k, g in df.groupby(df['id'].ne(df['id'].shift()).cumsum())}
print (new[1])
date value
0 02-04-2000 3
1 03-04-2000 8
相似的解决方案(无连续组):
new1 = {k: g.drop('id', axis=1) for k, g in df.groupby('id')}
print (new1['A'])
date value
0 02-04-2000 3
1 03-04-2000 8
按连续分组的分组,我尝试在其他数据中进行解释:
print (df)
id date value
0 A 02-04-2000 3
1 A 03-04-2000 8
2 B 04-04-2000 12
3 A 02-04-2000 7
4 A 03-04-2000 5
5 C 04-04-2000 2
new = {k: g.drop('id', axis=1)
for k, g in df.groupby(pd.factorize(df['id'])[0]+1)}
#all A rows is first group
print (new[1])
date value
0 02-04-2000 3
1 03-04-2000 8
3 02-04-2000 7
4 03-04-2000 5
#all C rows is third group
print (new[3])
date value
5 04-04-2000 2
按连续组分组:
print (df)
id date value
0 A 02-04-2000 3 <- 1group
1 A 03-04-2000 8 <- 1group
2 B 04-04-2000 12 <- 2group
3 A 02-04-2000 7 <- 3group
4 A 03-04-2000 5 <- 3group
5 C 04-04-2000 2 <- 4group
new = {k: g.drop('id', axis=1)
for k, g in df.groupby(df['id'].ne(df['id'].shift()).cumsum())}
#first group
print (new[1])
date value
0 02-04-2000 3
1 03-04-2000 8
#fourth group
print (new[3])
date value
3 02-04-2000 7
4 03-04-2000 5
答案 1 :(得分:0)
# generate new dict
new = {}
# get unique id values
unique_ids = df['id'].unique()
for index, value in zip(range(len(unique_ids)), unique_ids):
new[index] = df[df['id'] == value].copy()