我可以在当前行评估中使用来自apply(axis=1)
的先前计算出的答案吗?
我有这个df:
df = pd.DataFrame(np.random.randn(5,3),columns=list('ABC'))
df
A B C String_column
0 0.297925 -1.025012 1.307090 'a'
1 -1.527406 0.533451 -0.650252 'b'
2 -1.646425 0.738068 0.562747 'c'
3 -0.045872 0.088864 0.932650 'd'
4 -0.964226 0.542817 0.873731 'e'
,并且我尝试为每行添加前一行的值乘以2并加到当前值,而不操纵字符串列(例如row = row + row(shift-1) *0.5
)。
这是我到目前为止的代码:
def calc_by_previous_answer(row):
#here i have only the current row so I'm unable to get the previous one
row = row * 0.5
return row
#add the shift here will not propagate the previous answer
df = df.apply(calc_by_previous_answer, axis=1)
df
答案 0 :(得分:1)
这并不容易,但是可以通过loc
通过先前的值进行选择,因为仅选择数字列,请使用DataFrame.select_dtypes
:
def calc_by_previous_answer(row):
#here i have only the current row so I'm unable to get the previous one
#cannot select previous row of first row because not exist
if row.name > 0:
row = df.loc[row.name-1, c] * 0.5 + row
# else:
# row = row * 0.5
return row
c = df.select_dtypes(np.number).columns
df[c] = df[c].apply(calc_by_previous_answer, axis=1)
print (df)
A B C String_column
0 0.297925 -1.025012 1.307090 'a'
1 -1.378443 0.020945 0.003293 'b'
2 -2.410128 1.004794 0.237621 'c'
3 -0.869085 0.457898 1.214023 'd'
4 -0.987162 0.587249 1.340056 'e'
没有apply
和DataFrame.add
的解决方案:
c = df.select_dtypes(np.number).columns
df[c] = df[c].add(df[c].shift() * 0.5, fill_value=0)
print (df)
A B C String_column
0 0.297925 -1.025012 1.307090 'a'
1 -1.378443 0.020945 0.003293 'b'
2 -2.410128 1.004794 0.237621 'c'
3 -0.869085 0.457898 1.214023 'd'
4 -0.987162 0.587249 1.340056 'e'
编辑:
c = df.select_dtypes(np.number).columns
for idx, row in df.iterrows():
if row.name > 0:
df.loc[idx, c] = df.loc[idx-1, c] * 0.5 + df.loc[idx, c]
print (df)
A B C String_column
0 0.297925 -1.025012 1.307090 'a'
1 -1.378443 0.020945 0.003293 'b'
2 -2.335647 0.748541 0.564393 'c'
3 -1.213695 0.463134 1.214847 'd'
4 -1.571074 0.774384 1.481154 'e'
答案 1 :(得分:0)
无需使用apply
,您可以按以下方法解决。由于要在计算以下行值时使用更新的行值,因此需要使用for循环。
cols = ['A','B','C']
for i in range(1, len(df)):
df.loc[i, cols] = df.loc[i-1, cols] * 0.5 + df.loc[i, cols]
结果:
A B C String_column
0 0.297925 -1.025012 1.307090 'a'
1 -1.378443 0.020945 0.003293 'b'
2 -2.335647 0.748541 0.564393 'c'
3 -1.213695 0.463134 1.214847 'd'
4 -1.571074 0.774384 1.481154 'e'