我有以下数据框:
nationwide_measures = pd.read_sql_query("""select state,
measure_id,
measure_name,
score
from timely_and_effective_care___hospital;""", conn)
我创建了这个功能:
# Function to grab measure values
def get_stats(group):
df = pd.DataFrame({'Minimum': group.min(), 'Maximum': group.max(), 'Average': group.mean(), 'Standard Deviation': group.std()}, index = [0])
return df
# Function output
nationwide_measure_results = nationwide_measures1['score'].groupby(nationwide_measures1['measure_id']).apply(get_stats).unstack()
输出是一个包含以下5列的数据框:
“索引”| ('平均',0)| ('最大',0)| ( '最小',0)| ('标准 偏差”,0)
如何更改输出以重命名6列:
“测量ID”| “测量名称”| “平均”| “最大”| “最低”| “标准 偏差“
我试过了:
df = pd.DataFrame({'Minimum': group.min(), 'Maximum': group.max(), 'Average': group.mean(), 'Standard Deviation': group.std()}, index = [0], columns=["Measure ID", "Average", "Maximum", "Minimum", "Standard Deviation"])
和
df.columns = ["Measure ID", "Average", "Maximum", "Minimum", "Standard Deviation"]
在功能内部,都不起作用。
答案 0 :(得分:0)
我们试试这个例子。
import pandas as pd
import numpy as np
df = pd.DataFrame({'state':np.random.choice(['TX','CA','NY'],100),'measure_id':np.random.randint(1,5,100),'measure_name':np.nan,'score':np.random.randint(50,100,100)})
dict = {1:'Measure A',2:'Measure B',3:'Measure C',4:'Measure D',5:'Measure E'}
df['measure_name'] = df['measure_id'].map(dict)
输入数据:
measure_id measure_name score state
0 3 Measure C 82 CA
1 3 Measure C 93 CA
2 4 Measure D 69 NY
3 1 Measure A 56 NY
4 4 Measure D 66 CA
df_out=(df.groupby(['measure_id','measure_name'])['score'].agg(['mean','max','min','std'])
.rename(columns={'mean':'Average','max':'Maximum','min':'Minimum','std':'Standard Deviation'})
.rename_axis(['Measure ID','Measure Name'])
.reset_index())
print(df_out)
输出:
Measure ID Measure Name Average Maximum Minimum Standard Deviation
0 1 Measure A 74.346154 99 53 13.734460
1 2 Measure B 70.720000 97 50 16.084465
2 3 Measure C 76.130435 97 51 14.943239
3 4 Measure D 77.576923 97 56 10.756107
答案 1 :(得分:0)
首先,这是一种向数据框添加新列的方法。
df['Measure ID'] = pd.Series(df.index.values)
示例强>
>>> import pandas as pd
>>> df = pd.DataFrame({'Minimum': [1,1], 'Maximum': [0,0], 'Average': [0,1], 'Standard Deviation': [1,324]}, index = [0,1], columns=["Average", "Maximum", "Minimum", "Standard Deviation"])
Average Maximum Minimum Standard Deviation
0 0 0 1 1
1 1 0 1 324
>>> df['Measure ID'] = pd.Series(df.index.values)
Average Maximum Minimum Standard Deviation Measure ID
0 0 0 1 1 0
1 1 0 1 324 1
很难给你完全你需要什么,因为我们没有所需的全部输入。但您可以按照相同的格式添加新列
df['NEW COLUMN NAME'] = pd.Series(NEW_COLUMN_DATA)
我假设您希望以与示例相同的方式排序列,这里是如何重新排序列。
让我们说这是你的数据帧:
Average Maximum Minimum Standard Deviation Measure ID Measure Name
0 0 0 1 1 0 Place
1 1 0 1 324 1 Holder
然后我们可以这样做:
>>> cols = df.columns.tolist()
['Average',
'Maximum',
'Minimum',
'Standard Deviation',
'Measure ID',
'Measure Name']
>>> cols = cols[-2:] + cols[:-2]
['Measure ID',
'Measure Name',
'Average',
'Maximum',
'Minimum',
'Standard Deviation']
>>> df = df[cols]
Measure ID Measure Name Average Maximum Minimum Standard Deviation
0 0 Place 0 0 1 1
1 1 Holder 1 0 1 324