来自其他列的具有重复值的新列

时间:2018-11-15 09:50:05

标签: pandas

我有以下代码。我需要添加一列deaths_last_tuesday,该列显示每天上一个星期二的死亡人数。

import pandas as pd

data = {'date': ['2014-05-01', '2014-05-02', '2014-05-03', '2014-05-04', '2014-05-05', '2014-05-06', '2014-05-07',
                 '2014-05-08', '2014-05-09', '2014-05-10', '2014-05-11', '2014-05-12', '2014-05-13', '2014-05-14',
                 '2014-05-15', '2014-05-16', '2014-05-17', '2014-05-18', '2014-05-19', '2014-05-20'],
        'battle_deaths': [34, 25, 26, 15, 15, 14, 26, 25, 62, 41, 23, 56, 23, 34, 23, 67, 54, 34, 45, 12]}

df = pd.DataFrame(data, columns=['date', 'battle_deaths'])

df['date'] = pd.to_datetime(df['date'])
df['day_of_week'] = df['date'].dt.dayofweek
df = df.set_index('date')
df.sort_index()


            battle_deaths  day_of_week deaths_last_tuesday
date                                  
2014-05-01             34            3 
2014-05-02             25            4   24
2014-05-03             26            5   24
2014-05-04             15            6   24
2014-05-05             15            0   24
2014-05-06             14            1   24
2014-05-07             26            2   24
2014-05-08             25            3   24
2014-05-09             62            4   25
2014-05-10             41            5   25
2014-05-11             23            6   25
2014-05-12             56            0   25

我想这样做,以便将每天的死亡人数与上一个星期二的死亡人数进行比较。

1 个答案:

答案 0 :(得分:1)

使用:

df['deaths_last_tuesday'] = df['battle_deaths'].where(df['day_of_week'].eq(3)).ffill().shift()

print (df)
            battle_deaths  day_of_week  deaths_last_tuesday
date                                                       
2014-05-01             34            3                  NaN
2014-05-02             25            4                 34.0
2014-05-03             26            5                 34.0
2014-05-04             15            6                 34.0
2014-05-05             15            0                 34.0
2014-05-06             14            1                 34.0
2014-05-07             26            2                 34.0
2014-05-08             25            3                 34.0
2014-05-09             62            4                 25.0
2014-05-10             41            5                 25.0
2014-05-11             23            6                 25.0
2014-05-12             56            0                 25.0
2014-05-13             23            1                 25.0
2014-05-14             34            2                 25.0
2014-05-15             23            3                 25.0
2014-05-16             67            4                 23.0
2014-05-17             54            5                 23.0
2014-05-18             34            6                 23.0
2014-05-19             45            0                 23.0
2014-05-20             12            1                 23.0

说明

首先通过eq==)进行比较:

print (df['day_of_week'].eq(3))
date
2014-05-01     True
2014-05-02    False
2014-05-03    False
2014-05-04    False
2014-05-05    False
2014-05-06    False
2014-05-07    False
2014-05-08     True
2014-05-09    False
2014-05-10    False
2014-05-11    False
2014-05-12    False
2014-05-13    False
2014-05-14    False
2014-05-15     True
2014-05-16    False
2014-05-17    False
2014-05-18    False
2014-05-19    False
2014-05-20    False
Name: day_of_week, dtype: bool

然后为where的不匹配值创建缺失值:

print (df['battle_deaths'].where(df['day_of_week'].eq(3)))
date
2014-05-01    34.0
2014-05-02     NaN
2014-05-03     NaN
2014-05-04     NaN
2014-05-05     NaN
2014-05-06     NaN
2014-05-07     NaN
2014-05-08    25.0
2014-05-09     NaN
2014-05-10     NaN
2014-05-11     NaN
2014-05-12     NaN
2014-05-13     NaN
2014-05-14     NaN
2014-05-15    23.0
2014-05-16     NaN
2014-05-17     NaN
2014-05-18     NaN
2014-05-19     NaN
2014-05-20     NaN
Name: battle_deaths, dtype: float64

Forwrd填写缺失值:

print (df['battle_deaths'].where(df['day_of_week'].eq(3)).ffill())
date
2014-05-01    34.0
2014-05-02    34.0
2014-05-03    34.0
2014-05-04    34.0
2014-05-05    34.0
2014-05-06    34.0
2014-05-07    34.0
2014-05-08    25.0
2014-05-09    25.0
2014-05-10    25.0
2014-05-11    25.0
2014-05-12    25.0
2014-05-13    25.0
2014-05-14    25.0
2014-05-15    23.0
2014-05-16    23.0
2014-05-17    23.0
2014-05-18    23.0
2014-05-19    23.0
2014-05-20    23.0
Name: battle_deaths, dtype: float64

最后shift

print (df['battle_deaths'].where(df['day_of_week'].eq(3)).ffill().shift())
date
2014-05-01     NaN
2014-05-02    34.0
2014-05-03    34.0
2014-05-04    34.0
2014-05-05    34.0
2014-05-06    34.0
2014-05-07    34.0
2014-05-08    34.0
2014-05-09    25.0
2014-05-10    25.0
2014-05-11    25.0
2014-05-12    25.0
2014-05-13    25.0
2014-05-14    25.0
2014-05-15    25.0
2014-05-16    23.0
2014-05-17    23.0
2014-05-18    23.0
2014-05-19    23.0
2014-05-20    23.0
Name: battle_deaths, dtype: float64