我试图通过一些命令传递数据帧(为函数准备一系列参数)。但是,当我将一个数据帧分配给另一个数据帧时,这种分配似乎是等效的。换句话说,在将数据帧分配给新帧之后,所有更改同样适用于原始帧。将原始数据帧保持在原始状态的好方法是什么,以便可以将其重新分配给其他命令以进行其他更改。
请参见下面的示例。
# Merge several dataframes
df1 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'eTIV': [1.12, 2.22, 3.43, 5.43], })
df2 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Ear_Vol': [5, 6, 7, 8]})
df3 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Nose': [1, 2, 3, 5], })
df4 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Eye_Vol': [1, 2, 3, 5], })
df5 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Finger': [1.3, 2.123, 3.4, 5.5], })
dfs = [df1, df2, df3, df4,df5]
df_final = reduce(lambda left,right: pd.merge(left,right,on='ID'), dfs)
df_final
ID eTIV Ear_Vol Nose Eye_Vol Finger
0 Mary 1.12 5 1 1 1.300
1 Mike 2.22 6 2 2 2.123
2 Barry 3.43 7 3 3 3.400
3 Scotty 5.43 8 5 5 5.500
将数据帧分配给其他数据帧和操作:
df = df_final
df_raw = df
df_raw.columns = df_raw.columns.str.replace(r"_Vol", "_Vol_Raw")
df_raw = pd.DataFrame(data = df_raw, columns= df_raw.columns)
新数据框(如预期):
df_raw
ID eTIV Ear_Vol_Raw Nose Eye_Vol_Raw Finger
0 Mary 1.12 5 1 1 1.300
1 Mike 2.22 6 2 2 2.123
2 Barry 3.43 7 3 3 3.400
3 Scotty 5.43 8 5 5 5.500
由于某些原因,原始数据帧也被更改(为什么分配会在此处更改原始数据?):
df
ID eTIV Ear_Vol_Raw Nose Eye_Vol_Raw Finger
0 Mary 1.12 5 1 1 1.300
1 Mike 2.22 6 2 2 2.123
2 Barry 3.43 7 3 3 3.400
3 Scotty 5.43 8 5 5 5.500
答案 0 :(得分:3)
如果要复制数据框并创建新对象,请使用.copy
。
# Merge several dataframes
import pandas as pd
from functools import reduce
df1 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'eTIV': [1.12, 2.22, 3.43, 5.43], })
df2 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Ear_Vol': [5, 6, 7, 8]})
df3 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Nose': [1, 2, 3, 5], })
df4 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Eye_Vol': [1, 2, 3, 5], })
df5 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Finger': [1.3, 2.123, 3.4, 5.5], })
dfs = [df1, df2, df3, df4,df5]
df_final = reduce(lambda left,right: pd.merge(left,right,on='ID'), dfs)
df_final
df = df_final
print(df is df_final) #Prints True. They are both the same dataframe.
df_raw = df.copy() #Modified
print (df is df_raw) #Prints False. the copy method created a copy of the underlying dataframe object.
df_raw.columns = df_raw.columns.str.replace(r"_Vol", "_Vol_Raw")
df_raw = pd.DataFrame(data = df_raw, columns= df_raw.columns)
print(df_raw)
print(df) #No longer affected by df_raw
简单赋值显示原始行为的原因是因为名称是指python中的值。分配仅给出2个标签,它们都指向同一基础数据框对象。因此,修改对象后,与该对象相关的所有名称都会反映出更改。很好的进一步阅读here
答案 1 :(得分:0)
如果要复制和重命名列,则可以使用rename一步完成,默认情况下该方法将复制基础数据:
df_raw = df.rename(axis='columns', mapper=lambda s: s.replace(r"_Vol", "_Vol_Raw"))
print(df)
print(df_raw)
输出
ID eTIV Ear_Vol Nose Eye_Vol Finger
0 Mary 1.12 5 1 1 1.300
1 Mike 2.22 6 2 2 2.123
2 Barry 3.43 7 3 3 3.400
3 Scotty 5.43 8 5 5 5.500
ID eTIV Ear_Vol_Raw Nose Eye_Vol_Raw Finger
0 Mary 1.12 5 1 1 1.300
1 Mike 2.22 6 2 2 2.123
2 Barry 3.43 7 3 3 3.400
3 Scotty 5.43 8 5 5 5.500