根据熊猫的状况将年添加到日期列

时间:2020-10-01 19:15:42

标签: python pandas datetime

我有一个数据框,其中一列是日期(DATECOL)。我的目标是将年份增加一个“数字”,但是数字是变量,并且会有所不同。该数字将根据其他列的值确定。

例如在SQL中,可以实现为

SELECT CASE
    WHEN COL1 = "AB" THEN DATEADD(year, 2, DATECOL)
    WHEN COL1 = "XY" AND COL2 = "PQR" THEN DATEADD(year, 2, DATECOL)
    WHEN COL1 = "XY" AND COL2 != "PQR" THEN DATEADD(year, 3, DATECOL)
END AS NEWCOL
FROM DATAFRAME

有人可以帮我在熊猫中实现这种逻辑吗?

3 个答案:

答案 0 :(得分:2)

考虑numpy.select以获得多个逻辑条件和相应的值:

conditions = [(df['COL1'] == 'AB'),
              (df['COL1'] == 'XY') & (df['COL2'] == 'PQR'),
              (df['COL1'] == 'XY') & (df['COL2'] != 'PQR')]

choices = [df['DATECOL'] + pd.DateOffset(years=2),
           df['DATECOL'] + pd.DateOffset(years=2),
           df['DATECOL'] + pd.DateOffset(years=3)]

df['NEWCOL'] = np.select(conditions, choices, default=np.datetime64('NaT'))

答案 1 :(得分:0)

首先必须确保您的DATECOL列的类型为datetime。然后,您可以执行以下操作:

import pandas as pd

df = pd.DataFrame({'COL1': ['AB', 'XY', 'XY'], 'COL2': ['PQR', 'bla', 'PQR'], 'DATECOL': ['20101001', '20111001', '20121001']})

df['DATECOL'] = pd.to_datetime(df['DATECOL'], format='%Y%m%d') #change format as per your need

c1 = df['COL1']=='AB'
c2 = (df['COL1']=='XY') & (df['COL2']=='PQR')
c3 = (df['COL1']=='XY') & (df['COL2']!='PQR')

if len(df[c1]) > 0:
    df.loc[c1, 'DATECOL'] += pd.offsets.DateOffset(years=2)
elif len(df[c2]) > 0:
    df.loc[c2, 'DATECOL'] += pd.offsets.DateOffset(years=2)
elif len(df[c3]) > 0:
    df.loc[c3, 'DATECOL'] += pd.offsets.DateOffset(years=3)

答案 2 :(得分:0)

如何在Akanksha Atrey提到的条件下使用np.where呢?

np.where((first_case_condition)== True,add_yars,np.where(second_contion)== True,...)