我正在尝试降低以下问题的速度性能。我有一个数组,例如:
list1 = [0.564,0.011,0.560,-1.100,0.344,0.912,-0.983]
list2 = [0.0,1.0,1.0,0.0,0.0,0.0,-1.0]
list3 = [0.760,0.013,-0.580,1.120,0.144,-0.929,0.833]
list4 = [-1.0,1.0,0.0,1.0,0.0,0.0,1.0]
test_arr = np.column_stack((list1, list2,list3,list4))
给出:
我总是会有一列不同的浮点数(让我们将这些列称为“ random_numbers”),然后是另一列,仅包含-1.0、0.0和1.0值的混合(让我们将这些列称为“ ones_zeros”)。
最终目标是将任何-1.0或1.0(注意:不是0.0)值替换为紧靠左侧的值。对于此示例,输出为:
当前,我正在将numpy数组转换为pandas并应用以下功能:
def replace_values(test_arr_df,random_numbers,ones_zeros):
for cc in range(len(random_numbers)):
test_arr_df[ones_zeros[cc]] = test_arr_df.apply(
lambda row: row[random_numbers[cc]] if row[ones_zeros[cc]]==1 or row[ones_zeros[cc]]==-1
else row[ones_zeros[cc]],axis=1
)
return test_arr_df
将其应用于我们的测试用例:
#Convert to dataframe
test_arr_df=pd.DataFrame(test_arr)
#Tell the function what is a variable column and what is a minmax column
variable_columns = [0,2]; minmax_columns = [1,3]
#Replace values
res_df = replace_values(test_arr_df,variable_columns,minmax_columns)
此pandas方法有效,其结果与上面的示例输出相同。但是,它非常慢。在代码的其他部分,我通过保留numpy数组而不切换到熊猫来成功地减少了处理时间,但是在这里我没有成功。
所以,我的问题是,有没有办法使用numpy而不是pandas来做到这一点?还是使用熊猫的更快方法?我无法取得进展,因为我一直在索引错误的部分或无法替换正确的行/列。谢谢!
答案 0 :(得分:1)
您可以使用np.where
替换值:
import numpy as np
import pandas as pd
list1 = [0.564,0.011,0.560,-1.100,0.344,0.912,-0.983]
list2 = [0.0,1.0,1.0,0.0,0.0,0.0,-1.0]
list3 = [0.760,0.013,-0.580,1.120,0.144,-0.929,0.833]
list4 = [-1.0,1.0,0.0,1.0,0.0,0.0,1.0]
df = pd.DataFrame({0:list1, 1:list2, 2:list3, 3:list4})
df.iloc[:, 1::2] = np.where(df.iloc[:, 1::2].isin([1, -1]), df.iloc[:, ::2], 0)
print(df.to_numpy())
打印:
[[ 0.564 0. 0.76 0.76 ]
[ 0.011 0.011 0.013 0.013]
[ 0.56 0.56 -0.58 0. ]
[-1.1 0. 1.12 1.12 ]
[ 0.344 0. 0.144 0. ]
[ 0.912 0. -0.929 0. ]
[-0.983 -0.983 0.833 0.833]]
编辑:版本,其中明确选择了列名称:
import numpy as np
import pandas as pd
list1 = [0.564,0.011,0.560,-1.100,0.344,0.912,-0.983]
list2 = [0.0,1.0,1.0,0.0,0.0,0.0,-1.0]
list3 = [0.760,0.013,-0.580,1.120,0.144,-0.929,0.833]
list4 = [-1.0,1.0,0.0,1.0,0.0,0.0,1.0]
df = pd.DataFrame({'Pressure':list1, 'Pressure 0-1':list2, 'Temperature':list3, 'Temperature 0-1':list4})
df[['Pressure 0-1', 'Temperature 0-1']] = np.where(df[['Pressure 0-1', 'Temperature 0-1']].isin([1, -1]), df[ ['Pressure', 'Temperature'] ], 0)
print(df)
打印:
Pressure Pressure 0-1 Temperature Temperature 0-1
0 0.564 0.000 0.760 0.760
1 0.011 0.011 0.013 0.013
2 0.560 0.560 -0.580 0.000
3 -1.100 0.000 1.120 1.120
4 0.344 0.000 0.144 0.000
5 0.912 0.000 -0.929 0.000
6 -0.983 -0.983 0.833 0.833
答案 1 :(得分:0)
这里:
for x, y in np.argwhere(np.abs(test_arr) == 1.):
test_arr[x, y] = test_arr[x, y-1]
之前:
[[ 0.564 0. 0.76 -1. ]
[ 0.011 1. 0.013 1. ]
[ 0.56 1. -0.58 0. ]
[-1.1 0. 1.12 1. ]
[ 0.344 0. 0.144 0. ]
[ 0.912 0. -0.929 0. ]
[-0.983 -1. 0.833 1. ]]
之后:
[[ 0.564 0. 0.76 0.76 ]
[ 0.011 0.011 0.013 0.013]
[ 0.56 0.56 -0.58 0. ]
[-1.1 0. 1.12 1.12 ]
[ 0.344 0. 0.144 0. ]
[ 0.912 0. -0.929 0. ]
[-0.983 -0.983 0.833 0.833]]
逻辑:对于值x
或y
的所有1
和-1
坐标,用左侧的值替换。