有什么好的方法可以防止将更改应用于原始数据帧?

时间:2019-02-13 21:55:45

标签: python python-3.x pandas dataframe

我试图通过一些命令传递数据帧(为函数准备一系列参数)。但是,当我将一个数据帧分配给另一个数据帧时,这种分配似乎是等效的。换句话说,在将数据帧分配给新帧之后,所有更改同样适用于原始帧。将原始数据帧保持在原始状态的好方法是什么,以便可以将其重新分配给其他命令以进行其他更改。

请参见下面的示例。

# Merge several dataframes

df1 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'eTIV': [1.12, 2.22, 3.43, 5.43], })
df2 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Ear_Vol': [5, 6, 7, 8]})
df3 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Nose': [1, 2, 3, 5], })
df4 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Eye_Vol': [1, 2, 3, 5], })
df5 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Finger': [1.3, 2.123, 3.4, 5.5], })

dfs = [df1, df2, df3, df4,df5]

df_final = reduce(lambda left,right: pd.merge(left,right,on='ID'), dfs)

df_final

 ID eTIV    Ear_Vol Nose    Eye_Vol Finger
0   Mary    1.12    5   1   1   1.300
1   Mike    2.22    6   2   2   2.123
2   Barry   3.43    7   3   3   3.400
3   Scotty  5.43    8   5   5   5.500

将数据帧分配给其他数据帧和操作:

df = df_final
df_raw = df
df_raw.columns = df_raw.columns.str.replace(r"_Vol", "_Vol_Raw")
df_raw = pd.DataFrame(data = df_raw, columns= df_raw.columns)

新数据框(如预期):

df_raw
ID  eTIV    Ear_Vol_Raw Nose    Eye_Vol_Raw Finger
0   Mary    1.12    5   1   1   1.300
1   Mike    2.22    6   2   2   2.123
2   Barry   3.43    7   3   3   3.400
3   Scotty  5.43    8   5   5   5.500

由于某些原因,原始数据帧也被更改(为什么分配会在此处更改原始数据?):

df

    ID  eTIV    Ear_Vol_Raw Nose    Eye_Vol_Raw Finger
0   Mary    1.12    5   1   1   1.300
1   Mike    2.22    6   2   2   2.123
2   Barry   3.43    7   3   3   3.400
3   Scotty  5.43    8   5   5   5.500

2 个答案:

答案 0 :(得分:3)

如果要复制数据框并创建新对象,请使用.copy

# Merge several dataframes
import pandas as pd
from functools import reduce
df1 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'eTIV': [1.12, 2.22, 3.43, 5.43], })
df2 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Ear_Vol': [5, 6, 7, 8]})
df3 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Nose': [1, 2, 3, 5], })
df4 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Eye_Vol': [1, 2, 3, 5], })
df5 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Finger': [1.3, 2.123, 3.4, 5.5], })

dfs = [df1, df2, df3, df4,df5]

df_final = reduce(lambda left,right: pd.merge(left,right,on='ID'), dfs)

df_final
df = df_final

print(df is df_final) #Prints True. They are both the same dataframe.

df_raw = df.copy() #Modified

print (df is df_raw) #Prints False. the copy method created a copy of the underlying dataframe object.
df_raw.columns = df_raw.columns.str.replace(r"_Vol", "_Vol_Raw")
df_raw = pd.DataFrame(data = df_raw, columns= df_raw.columns)
print(df_raw)
print(df) #No longer affected by df_raw

简单赋值显示原始行为的原因是因为名称是指python中的值。分配仅给出2个标签,它们都指向同一基础数据框对象。因此,修改对象后,与该对象相关的所有名称都会反映出更改。很好的进一步阅读here

答案 1 :(得分:0)

如果要复制和重命名列,则可以使用rename一步完成,默认情况下该方法将复制基础数据:

df_raw = df.rename(axis='columns', mapper=lambda s: s.replace(r"_Vol", "_Vol_Raw"))

print(df)
print(df_raw)

输出

       ID  eTIV  Ear_Vol  Nose  Eye_Vol  Finger
0    Mary  1.12        5     1        1   1.300
1    Mike  2.22        6     2        2   2.123
2   Barry  3.43        7     3        3   3.400
3  Scotty  5.43        8     5        5   5.500
       ID  eTIV  Ear_Vol_Raw  Nose  Eye_Vol_Raw  Finger
0    Mary  1.12            5     1            1   1.300
1    Mike  2.22            6     2            2   2.123
2   Barry  3.43            7     3            3   3.400
3  Scotty  5.43            8     5            5   5.500