循环遍历pandas数据帧列表并创建新数据帧

时间:2017-12-22 16:09:29

标签: python-2.7 pandas

我有一个数据框列表:

data_frames = [sort,sort1,sort2]

我想迭代它们并将一些统计信息存储在新的df中。我觉得这是微不足道的,但下面的函数返回一个空数据框df_concat = df_stats(data_frames)。我错过了什么?非常感谢您的帮助。

创建示例数据集:

import pandas as pd
data = {'number': [23,56,89], 'PVs': [23456, 34456, 6789]}
sort = pd.DataFrame.from_dict(data)
data1 = {'number': [28,52,12], 'PVs': [3423456, 2334456, 36789]}
sort1 = pd.DataFrame.from_dict(data1)
data2 = {'number': [123,5,86], 'PVs': [2345655, 934456, 16789]}
sort2 = pd.DataFrame.from_dict(data2)

迭代数据帧的函数:

def df_stats(data_frames):
    df = pd.DataFrame()
    for data in data_frames:
        df['Number'] = data.number.count()
        df["Total PVs"] = '{0:,.0f}'.format(data.PVs.sum())
        df["Average"] = '{0:,.0f}'.format(data.PVs.mean())
        df["Median"] = '{0:,.0f}'.format(data.PVs.median())
    return df

3 个答案:

答案 0 :(得分:2)

我们可以使用pd.concat + groupby而不是循环

pd.concat(data_frames,keys=[1,2,3]).groupby(level=0).agg({'number':'count','PVs':['sum','mean','median']})
Out[1117]: 
  number      PVs                       
   count      sum          mean   median
1      3    64701  2.156700e+04    23456
2      3  5794701  1.931567e+06  2334456
3      3  3296900  1.098967e+06   934456

此外,如果您想使用您的功能,您可以将其修复为

df = pd.DataFrame()
for i,data in enumerate(data_frames):
    df.at[i,'Number'] = data.number.count()
    df.at[i,"Total PVs"] = '{0:,.0f}'.format(data.PVs.sum())
    df.at[i,"Average"] = '{0:,.0f}'.format(data.PVs.mean())
    df.at[i,"Median"] = '{0:,.0f}'.format(data.PVs.median())



df
Out[1121]: 
   Number  Total PVs    Average     Median
0     3.0     64,701     21,567     23,456
1     3.0  5,794,701  1,931,567  2,334,456
2     3.0  3,296,900  1,098,967    934,456

答案 1 :(得分:1)

试试这个:

''' Example  DataFrames '''
data1 = pd.DataFrame({'number': [23,56,89], 'PVs': [23456, 34456, 6789]}, 
columns=['number', 'PVs'])
data2 = pd.DataFrame({'number': [28,52,12], 'PVs': [3423456, 2334456, 36789]}, columns=['number', 'PVs'])
data3 = pd.DataFrame({'number': [123,5,86], 'PVs': [2345655, 934456, 16789]}, 
columns=['number', 'PVs'])


''' The function returning the stats '''
def df_stats(dataFrame):
    df = pd.DataFrame({}, columns=['Number', 'Total PVs', 'Average', 'Median'])
    df.loc['Number'] = dataFrame['number'].count()
    df["Total PVs"] = '{0:,.0f}'.format(dataFrame['PVs'].sum())
    df["Average"] = '{0:,.0f}'.format(dataFrame['PVs'].mean())
    df["Median"] = '{0:,.0f}'.format(dataFrame['PVs'].median())
    return df


''' Create a list of DataFrames to iterate over '''
data_frames = [data1, data2, data3]

''' Create an emmpty DataFrame so you can include it in pd.concat() '''
result = pd.DataFrame()


''' Iterate over DataFrame list and concatenate'''
for dataFrame in data_frames:
    tempDF = df_stats(dataFrame)
    result = pd.concat([result,tempDF], ignore_index=True)

result.head(3)

输出结果为:

    Number  Total PVs   Average    Median
0   3       64,701      21,567     23,456
1   3       5,794,701   1,931,567  2,334,456
2   3       3,296,900   1,098,967  934,456

答案 2 :(得分:0)

以下功能有效

dict_df ={'df1':sort1,'df':sort,'df2':sort2}


def df_stats(dict_df):
    df = pd.DataFrame(columns=['Number','Total PVs','Average','Median'],index=dict_df.keys())   
    for name,data in dict_df.items():
        df.loc[name,"Number"] = data.number.count()
        df.loc[name,"Total PVs"] = '{0:,.0f}'.format(data.PVs.sum())
        df.loc[name,"Average"] = '{0:,.0f}'.format(data.PVs.mean())
        df.loc[name,"Median"] = '{0:,.0f}'.format(data.PVs.median())
    return df

输出:

          Number    Total PVs   Average     Median
     df2    3      3,296,900    1,098,967   934,456
     df1    3      5,794,701    1,931,567   2,334,456
     df     3      64,701       21,567      23,456