我正在尝试对数据帧进行操作,但我似乎无法以我想要的方式重新格式化。
我有:
>>df = pd.DataFrame({
'person':['Al','Al','Bob','Bob','Bob','Sue','Sue'],
'pet':['Cat','Dog','Fish','Fish','Zebra','Fish','Dog']})
>>df
person pet
0 Al Cat
1 Al Dog
2 Bob Fish
3 Bob Fish
4 Bob Zebra
5 Sue Fish
6 Sue Dog
我想聚合到人员级别并且嵌套标签如下:
person pet_info
pet number
0 Al Cat 1
Dog 1
1 Bob Fish 2
Zebra 1
....
这样pet_info列中有两个标签/列名,以便:
for row in df:
print(row['person'])
for stuff in row['pet_info']:
print(stuff['pet'])
将输出:
Al
Cat
Dog
Bob
Fish
...
关于如何做到这一点的任何想法?我似乎无法以这种方式实现这一变化,而且我对熊猫相当熟悉......
谢谢!
答案 0 :(得分:0)
简单的groupby
+ count
/ size
应该这样做。
df2 = df.groupby(['person', 'pet']).pet.count()\
.to_frame('number').reset_index(level=1)
df2
pet number
person
Al Cat 1
Al Dog 1
Bob Fish 2
Bob Zebra 1
Sue Dog 1
Sue Fish 1
现在,将MultiIndex
分配给df2.columns
:
idx = pd.MultiIndex.from_product([['pet_info'], df2.columns])
df2.columns = idx
df2 = df2.reset_index()
df2
person pet_info
pet number
0 Al Cat 1
1 Al Dog 1
2 Bob Fish 2
3 Bob Zebra 1
4 Sue Dog 1
5 Sue Fish 1
现在,您可以使用df2['pet_info']['pet']
为每个级别编制索引。如果您想在问题中输出输出,则无法转义groupby
:
for n, g in df2.groupby('person'):
print(n)
for p in g.pet_info.pet:
print(p)
Al
Cat
Dog
Bob
Fish
Zebra
Sue
Dog
Fish