使用来自不同函数python pandas的多个返回数据集

时间:2017-05-11 19:14:14

标签: python function csv pandas

我正在处理3个数据集,我写了3个不同的函数,每个数据集1个来做一些数据清理和操作。最后,我想在另一个函数中组合所有3个已清理的数据集。

我的逻辑:

import pandas as pd
def function1():
    read in data as df
    df[(df.column1 != "")&(df.column2 != 'MRN')&(df.column3 != "C") ]
    return data1.to_csv() 

def function2():
    read in data as df
    df[(df.column1 != "A")&(df.column2 != 'M')&(df.column3 != " ") ]
    return data2.to_csv() 

def function3():
    read in data as df
    df[(df.column1 != "B")&(df.column2 != 'N')&(df.column3 != " ") ]
    return data3.to_csv() 

def combinedatasets():
    merge (data1, data2, data3)
    return combineddata.to_csv() 

现在我将data1,data2和data3作为新文件输出到目录中。有没有将它们临时存储在脚本中,这样就不会输出这3个文件,只输出combineddate.csv? 如何从我的combineddatasets函数中的前3个函数中调用这些临时数据集data1,data2,data3来组合它们?

如下所示:

import pandas as pd
def function1():
    read in data as df
    df[(df.column1 != "")&(df.column2 != 'MRN')&(df.column3 != "C") ]
    return temporary data1 without outputting it

def function2():
    read in data as df
    df[(df.column1 != "A")&(df.column2 != 'M')&(df.column3 != " ") ]
    return temporary data2 without outputting it

def function3():
    read in data as df
    df[(df.column1 != "B")&(df.column2 != 'N')&(df.column3 != " ") ]
    return temporary data3 without outputting it

def combinedatasets():
    calling temporary data1,2,3 and 
    merge (data1, data2, data3)
    return pd.to_csv('combineddata.csv') #output as a csv file

因此只有'combineddata.csv'会输出到该文件夹​​。

1 个答案:

答案 0 :(得分:1)

简单地将一个对象分配给函数调用,因为函数返回一个数据帧:

def myfunction():
    data = pd.read_csv('Input.csv')
    # process dataframe...
    return data

def combinedatasets():
    df = myfunction()

或同时分配:

def combinedatasets():
    data1, data2, data3 = function1(), function2(), function3()

但是,请避免在您的环境中使用类似结构的多个数据帧,并将数据帧保存到一个列表中,然后您可以合并或追加这些列表:

def combinedatasets():
    dfList = [function1(), function2(), function3()]

    # MERGE/COLUMN BIND
    combinedf = pd.concat(dfList, axis=1, join_axes=[dfList[0].index])
    combinedf.to_csv('CombinedWideData.csv')

    # APPEND/ROW BIND
    combinedf = pd.concat(dfList)
    combinedf.to_csv('CombinedLongData.csv')