在Pandas中嵌入2D标记的数组

时间:2017-11-19 23:23:43

标签: python pandas dataframe

我正在尝试对数据帧进行操作,但我似乎无法以我想要的方式重新格式化。

我有:

>>df = pd.DataFrame({
'person':['Al','Al','Bob','Bob','Bob','Sue','Sue'],
'pet':['Cat','Dog','Fish','Fish','Zebra','Fish','Dog']})
>>df
  person   pet
0     Al   Cat
1     Al   Dog
2    Bob   Fish
3    Bob   Fish
4    Bob   Zebra
5    Sue   Fish
6    Sue   Dog

我想聚合到人员级别并且嵌套标签如下:

   person  pet_info
           pet    number
0  Al      Cat    1
           Dog    1
1  Bob     Fish   2  
           Zebra  1
....

这样pet_info列中有两个标签/列名,以便:

 for row in df:
   print(row['person'])
   for stuff in row['pet_info']:
        print(stuff['pet'])

将输出:

Al
Cat
Dog
Bob
Fish
...

关于如何做到这一点的任何想法?我似乎无法以这种方式实现这一变化,而且我对熊猫相当熟悉......

谢谢!

1 个答案:

答案 0 :(得分:0)

简单的groupby + count / size应该这样做。

df2 = df.groupby(['person', 'pet']).pet.count()\
               .to_frame('number').reset_index(level=1)

df2

          pet  number
person               
Al        Cat       1
Al        Dog       1
Bob      Fish       2
Bob     Zebra       1
Sue       Dog       1
Sue      Fish       1

现在,将MultiIndex分配给df2.columns

idx = pd.MultiIndex.from_product([['pet_info'], df2.columns])
df2.columns = idx
df2 = df2.reset_index()

df2

  person pet_info       
              pet number
0     Al      Cat      1
1     Al      Dog      1
2    Bob     Fish      2
3    Bob    Zebra      1
4    Sue      Dog      1
5    Sue     Fish      1

现在,您可以使用df2['pet_info']['pet']为每个级别编制索引。如果您想在问题中输出输出,则无法转义groupby

for n, g in df2.groupby('person'):
     print(n)
     for p in g.pet_info.pet:
         print(p)

Al
Cat
Dog
Bob
Fish
Zebra
Sue
Dog
Fish