如何将分组结果转换为数据框

时间:2017-05-15 07:19:15

标签: python pandas

我有以下数据框:

import pandas as pd
import numpy as np
df = pd.DataFrame({
               'category': ['ctr','ctr','ctr','ctr','ctr','ctr'],
               'expected_count': [100,100,112,1.3,14,125],
               'sample_id': ['S1','S1','S1','S2','S2','S2'],
               'gene_symbol': ['a', 'b', 'c', 'a', 'b', 'c'],
               })

产生这个:

In [2]: df
Out[2]:
  category  expected_count gene_symbol sample_id
0      ctr           100.0           a        S1
1      ctr           100.0           b        S1
2      ctr           112.0           c        S1
3      ctr             1.3           a        S2
4      ctr            14.0           b        S2
5      ctr           125.0           c        S2

我没有问题用基因符号分组:

In [4]: gdf = df.groupby(by = 'gene_symbol')['expected_count'].mean()
   ...: gdf
   ...:
Out[4]:
gene_symbol
a     50.65
b     57.00
c    118.50
Name: expected_count, dtype: float64

In [5]: str(gdf)
Out[5]: 'gene_symbol\na     50.65\nb     57.00\nc    118.50\nName: expected_count, dtype: float64'

请注意gdf是一个字符串。如何将其转换为数据框?

2 个答案:

答案 0 :(得分:1)

需要as_index=Falsereset_index

gdf = df.groupby('gene_symbol', as_index=False)['expected_count'].mean()
print (gdf)
  gene_symbol  expected_count
0           a           50.65
1           b           57.00
2           c          118.50

或者:

gdf = df.groupby('gene_symbol')['expected_count'].mean().reset_index()
print (gdf)
  gene_symbol  expected_count
0           a           50.65
1           b           57.00
2           c          118.50

输出不是string,而是Series

print (type(df.groupby('gene_symbol')['expected_count'].mean()))
<class 'pandas.core.series.Series'>

答案 1 :(得分:1)

您可以使用:

gdf = df.groupby(by = 'gene_symbol')['expected_count'].mean().to_frame()

gdf
Out[149]: 
             expected_count
gene_symbol                
a                     50.65
b                     57.00
c                    118.50