Question

我来自R背景，没有碰到这个问题。

通常，我过去制作的函数会作用于数据框并返回该数据框的某些修改版本。例如：

df=pd.DataFrame({"a":[1,2,3,4,5], "b":[6,7,8,9,10]})

def multiply_function(dataset):
    dataset['output']=dataset.iloc[:,1] * dataset.iloc[:,0]
    return(dataset)

new_df=multiply_function(df)
new_df # looks good!
df # I would expect that df stays the same and isn't updated with the new column

我正在尝试将大量功能或代码从一种语言转换为另一种语言。我想避免发生此问题，以免由于功能内部发生的情况而 df 不会全局更新。

当我重新运行代码或修改代码时，这尤其重要，因为一个数据帧可能无法两次有效地通过一个函数运行。

我见过

dataset = dataset.copy()

作为第一行代码...但这真的很理想吗？有没有更好的办法解决这个问题？我以为这会在处理大型数据集时真的炸毁内存中的数据量？

谢谢！

了解局部变量与全局变量并创建函数

0 个答案: