我有以下代码。我需要添加一列deaths_last_tuesday
,该列显示每天上一个星期二的死亡人数。
import pandas as pd
data = {'date': ['2014-05-01', '2014-05-02', '2014-05-03', '2014-05-04', '2014-05-05', '2014-05-06', '2014-05-07',
'2014-05-08', '2014-05-09', '2014-05-10', '2014-05-11', '2014-05-12', '2014-05-13', '2014-05-14',
'2014-05-15', '2014-05-16', '2014-05-17', '2014-05-18', '2014-05-19', '2014-05-20'],
'battle_deaths': [34, 25, 26, 15, 15, 14, 26, 25, 62, 41, 23, 56, 23, 34, 23, 67, 54, 34, 45, 12]}
df = pd.DataFrame(data, columns=['date', 'battle_deaths'])
df['date'] = pd.to_datetime(df['date'])
df['day_of_week'] = df['date'].dt.dayofweek
df = df.set_index('date')
df.sort_index()
battle_deaths day_of_week deaths_last_tuesday
date
2014-05-01 34 3
2014-05-02 25 4 24
2014-05-03 26 5 24
2014-05-04 15 6 24
2014-05-05 15 0 24
2014-05-06 14 1 24
2014-05-07 26 2 24
2014-05-08 25 3 24
2014-05-09 62 4 25
2014-05-10 41 5 25
2014-05-11 23 6 25
2014-05-12 56 0 25
我想这样做,以便将每天的死亡人数与上一个星期二的死亡人数进行比较。
答案 0 :(得分:1)
使用:
df['deaths_last_tuesday'] = df['battle_deaths'].where(df['day_of_week'].eq(3)).ffill().shift()
print (df)
battle_deaths day_of_week deaths_last_tuesday
date
2014-05-01 34 3 NaN
2014-05-02 25 4 34.0
2014-05-03 26 5 34.0
2014-05-04 15 6 34.0
2014-05-05 15 0 34.0
2014-05-06 14 1 34.0
2014-05-07 26 2 34.0
2014-05-08 25 3 34.0
2014-05-09 62 4 25.0
2014-05-10 41 5 25.0
2014-05-11 23 6 25.0
2014-05-12 56 0 25.0
2014-05-13 23 1 25.0
2014-05-14 34 2 25.0
2014-05-15 23 3 25.0
2014-05-16 67 4 23.0
2014-05-17 54 5 23.0
2014-05-18 34 6 23.0
2014-05-19 45 0 23.0
2014-05-20 12 1 23.0
说明:
首先通过eq
(==
)进行比较:
print (df['day_of_week'].eq(3))
date
2014-05-01 True
2014-05-02 False
2014-05-03 False
2014-05-04 False
2014-05-05 False
2014-05-06 False
2014-05-07 False
2014-05-08 True
2014-05-09 False
2014-05-10 False
2014-05-11 False
2014-05-12 False
2014-05-13 False
2014-05-14 False
2014-05-15 True
2014-05-16 False
2014-05-17 False
2014-05-18 False
2014-05-19 False
2014-05-20 False
Name: day_of_week, dtype: bool
然后为where
的不匹配值创建缺失值:
print (df['battle_deaths'].where(df['day_of_week'].eq(3)))
date
2014-05-01 34.0
2014-05-02 NaN
2014-05-03 NaN
2014-05-04 NaN
2014-05-05 NaN
2014-05-06 NaN
2014-05-07 NaN
2014-05-08 25.0
2014-05-09 NaN
2014-05-10 NaN
2014-05-11 NaN
2014-05-12 NaN
2014-05-13 NaN
2014-05-14 NaN
2014-05-15 23.0
2014-05-16 NaN
2014-05-17 NaN
2014-05-18 NaN
2014-05-19 NaN
2014-05-20 NaN
Name: battle_deaths, dtype: float64
Forwrd填写缺失值:
print (df['battle_deaths'].where(df['day_of_week'].eq(3)).ffill())
date
2014-05-01 34.0
2014-05-02 34.0
2014-05-03 34.0
2014-05-04 34.0
2014-05-05 34.0
2014-05-06 34.0
2014-05-07 34.0
2014-05-08 25.0
2014-05-09 25.0
2014-05-10 25.0
2014-05-11 25.0
2014-05-12 25.0
2014-05-13 25.0
2014-05-14 25.0
2014-05-15 23.0
2014-05-16 23.0
2014-05-17 23.0
2014-05-18 23.0
2014-05-19 23.0
2014-05-20 23.0
Name: battle_deaths, dtype: float64
最后shift
:
print (df['battle_deaths'].where(df['day_of_week'].eq(3)).ffill().shift())
date
2014-05-01 NaN
2014-05-02 34.0
2014-05-03 34.0
2014-05-04 34.0
2014-05-05 34.0
2014-05-06 34.0
2014-05-07 34.0
2014-05-08 34.0
2014-05-09 25.0
2014-05-10 25.0
2014-05-11 25.0
2014-05-12 25.0
2014-05-13 25.0
2014-05-14 25.0
2014-05-15 25.0
2014-05-16 23.0
2014-05-17 23.0
2014-05-18 23.0
2014-05-19 23.0
2014-05-20 23.0
Name: battle_deaths, dtype: float64