Question

我正在做一些数据分析，数据在pandas {% extends 'base.html' %} {% block title %}Flasky - Home page{% endblock %} {% block page_content %} <p>The local date and time is {{moment(current_time).format('LLL')}}.</p> <p>That was {{moment(current_time).fromNow(refresh=True)}}</p> {% endblock %}，DataFrame。

我定义了几个函数来处理df。

出于封装目的，我定义了这样的函数：

df

在Jupyter Notebook中，我使用函数

def df_process(df):
    df=df.copy()
    # do some process work on df
    return df

使用df = df_process(df)的原因是原来的df.copy()会被修改，无论您是否将其分配。（见Python & Pandas: How to return a copy of a dataframe?）

我的问题是：

在这里使用df吗？如果没有，应该如何定义函数处理数据？
由于我使用了几个这样的数据处理功能，它会影响我程序的性能吗？又多少钱？

Answer 1

更好的是：

def df_process(df):
    # do some process work on df

def df_another(df):
    # other processing

def df_more(df):
    # yet more processing

def process_many(df):
    for frame_function in (df_process, df_another, df_more):
        df_copy = df.copy()
        frame_function(df_copy)
        # emit the results to a file or screen or whatever

这里的关键是你必须制作副本，只制作一个副本，处理它，将结果存储在某处，然后通过重新分配df_copy来处置它。你的问题没有提到你为什么要挂在已处理的副本上，所以这假设你不需要。

Python＆amp;熊猫：使用很多df.copy会影响代码的性能吗？

1 个答案: