在功能中重命名Pandas索引

时间:2017-07-17 20:50:38

标签: python pandas dataframe

我有以下数据框:

nationwide_measures = pd.read_sql_query("""select state,
          measure_id,
          measure_name,
          score
from timely_and_effective_care___hospital;""", conn)

我创建了这个功能:

# Function to grab measure values
def get_stats(group):
    df = pd.DataFrame({'Minimum': group.min(), 'Maximum': group.max(), 'Average': group.mean(), 'Standard Deviation': group.std()}, index = [0])
    return df

# Function output
nationwide_measure_results = nationwide_measures1['score'].groupby(nationwide_measures1['measure_id']).apply(get_stats).unstack()

输出是一个包含以下5列的数据框:

  

“索引”| ('平均',0)| ('最大',0)| ( '最小',0)| ('标准   偏差”,0)

如何更改输出以重命名6列:

  

“测量ID”| “测量名称”| “平均”| “最大”| “最低”| “标准   偏差“

我试过了:

df = pd.DataFrame({'Minimum': group.min(), 'Maximum': group.max(), 'Average': group.mean(), 'Standard Deviation': group.std()}, index = [0], columns=["Measure ID", "Average", "Maximum", "Minimum", "Standard Deviation"])

df.columns = ["Measure ID", "Average", "Maximum", "Minimum", "Standard Deviation"]

在功能内部,都不起作用。

2 个答案:

答案 0 :(得分:0)

我们试试这个例子。

import pandas as pd
import numpy as np

df = pd.DataFrame({'state':np.random.choice(['TX','CA','NY'],100),'measure_id':np.random.randint(1,5,100),'measure_name':np.nan,'score':np.random.randint(50,100,100)})

dict = {1:'Measure A',2:'Measure B',3:'Measure C',4:'Measure D',5:'Measure E'}

df['measure_name'] = df['measure_id'].map(dict)

输入数据:

   measure_id measure_name  score state
0           3    Measure C     82    CA
1           3    Measure C     93    CA
2           4    Measure D     69    NY
3           1    Measure A     56    NY
4           4    Measure D     66    CA

df_out=(df.groupby(['measure_id','measure_name'])['score'].agg(['mean','max','min','std'])
         .rename(columns={'mean':'Average','max':'Maximum','min':'Minimum','std':'Standard Deviation'})
         .rename_axis(['Measure ID','Measure Name'])
         .reset_index())

print(df_out)

输出:

   Measure ID Measure Name    Average  Maximum  Minimum  Standard Deviation
0           1    Measure A  74.346154       99       53           13.734460
1           2    Measure B  70.720000       97       50           16.084465
2           3    Measure C  76.130435       97       51           14.943239
3           4    Measure D  77.576923       97       56           10.756107

答案 1 :(得分:0)

首先,这是一种向数据框添加新列的方法。

df['Measure ID'] = pd.Series(df.index.values)

示例

>>> import pandas as pd
>>> df = pd.DataFrame({'Minimum': [1,1], 'Maximum': [0,0], 'Average': [0,1], 'Standard Deviation': [1,324]}, index = [0,1], columns=["Average", "Maximum", "Minimum", "Standard Deviation"])
       Average  Maximum  Minimum  Standard Deviation
0        0        0        1                   1
1        1        0        1                 324
>>> df['Measure ID'] = pd.Series(df.index.values)
       Average  Maximum  Minimum  Standard Deviation  Measure ID
0        0        0        1                   1           0
1        1        0        1                 324           1

很难给你完全你需要什么,因为我们没有所需的全部输入。但您可以按照相同的格式添加新列

df['NEW COLUMN NAME'] = pd.Series(NEW_COLUMN_DATA)

我假设您希望以与示例相同的方式排序列,这里是如何重新排序列。

让我们说这是你的数据帧:

   Average  Maximum  Minimum  Standard Deviation  Measure ID Measure Name
0        0        0        1                   1           0        Place
1        1        0        1                 324           1       Holder

然后我们可以这样做:

>>> cols = df.columns.tolist()
['Average',
 'Maximum',
 'Minimum',
 'Standard Deviation',
 'Measure ID',
 'Measure Name']
>>> cols = cols[-2:] + cols[:-2]
['Measure ID',
 'Measure Name',
 'Average',
 'Maximum',
 'Minimum',
 'Standard Deviation']
>>> df = df[cols]
   Measure ID Measure Name  Average  Maximum  Minimum  Standard Deviation
0           0        Place        0        0        1                   1
1           1       Holder        1        0        1                 324