我有一个这样的数据框:
name value variation
A 2 0.97
B 3 1.2
C NaN 1.1
D NaN 0.8
E NaN 0.87
F 4 1.1
我需要使用上一行value * variation
填充这些NaN值
name value variation
A 2 0.97
B 3 1.2
C 3.6 1.1 >> 3*1.2
D 3.96 0.8 >> 3.6*1.1
E 3.16 0.87 >> 3.96*0.8
F 4 1.1
我认为有一种有效的方法来执行此操作,但是在stackoverflow中找不到相关的问题。
谢谢!
答案 0 :(得分:1)
使用:
#create 2 groups of NaNs for better sample data
df = pd.concat([df] * 2, ignore_index=True)
#create mask by missing value and one non missing value before
m = df['value'].isna() | df['value'].isna().shift(-1).fillna(False)
#create groups by consecutive Trues
g = m.ne(m.shift()).cumsum()[m]
#use cumprod per groups, shift and multiple by forward filling value
s = df['variation'].groupby(g).cumprod().shift() * df['value'].ffill()
#replace missing values by Series s
df['value'] = df['value'].fillna(s)
print (df)
name value variation
0 A 2.000 0.97
1 B 3.000 1.20
2 C 3.600 1.10
3 D 3.960 0.80
4 E 3.168 0.87
5 F 4.000 1.10
6 A 2.000 0.97
7 B 3.000 1.20
8 C 3.600 1.10
9 D 3.960 0.80
10 E 3.168 0.87
11 F 4.000 1.10
答案 1 :(得分:1)
这使用的熊猫版本为0.25.1:
import pandas as pd
import numpy as np
df = pd.DataFrame({'name':['A','B','C','D','E','F'], 'value':[2,3,np.NaN, np.NaN, np.NaN, 4],'variation':[0.97,1.2,1.1,0.8,0.87,1.1]})
df['value_ffill'] = df['value'].fillna(method='ffill')
df['value'].fillna(df['value_ffill']*df['variation'], inplace=True)
df.drop(columns=['value_ffill'], inplace=True)
print(df)
# name value variation
# 0 A 2.00 0.97
# 1 B 3.00 1.20
# 2 C 3.30 1.10
# 3 D 2.40 0.80
# 4 E 2.61 0.87
# 5 F 4.00 1.10