我有一个数据框,其中一列是日期(DATECOL)。我的目标是将年份增加一个“数字”,但是数字是变量,并且会有所不同。该数字将根据其他列的值确定。
例如在SQL中,可以实现为
SELECT CASE
WHEN COL1 = "AB" THEN DATEADD(year, 2, DATECOL)
WHEN COL1 = "XY" AND COL2 = "PQR" THEN DATEADD(year, 2, DATECOL)
WHEN COL1 = "XY" AND COL2 != "PQR" THEN DATEADD(year, 3, DATECOL)
END AS NEWCOL
FROM DATAFRAME
有人可以帮我在熊猫中实现这种逻辑吗?
答案 0 :(得分:2)
考虑numpy.select
以获得多个逻辑条件和相应的值:
conditions = [(df['COL1'] == 'AB'),
(df['COL1'] == 'XY') & (df['COL2'] == 'PQR'),
(df['COL1'] == 'XY') & (df['COL2'] != 'PQR')]
choices = [df['DATECOL'] + pd.DateOffset(years=2),
df['DATECOL'] + pd.DateOffset(years=2),
df['DATECOL'] + pd.DateOffset(years=3)]
df['NEWCOL'] = np.select(conditions, choices, default=np.datetime64('NaT'))
答案 1 :(得分:0)
首先必须确保您的DATECOL
列的类型为datetime。然后,您可以执行以下操作:
import pandas as pd
df = pd.DataFrame({'COL1': ['AB', 'XY', 'XY'], 'COL2': ['PQR', 'bla', 'PQR'], 'DATECOL': ['20101001', '20111001', '20121001']})
df['DATECOL'] = pd.to_datetime(df['DATECOL'], format='%Y%m%d') #change format as per your need
c1 = df['COL1']=='AB'
c2 = (df['COL1']=='XY') & (df['COL2']=='PQR')
c3 = (df['COL1']=='XY') & (df['COL2']!='PQR')
if len(df[c1]) > 0:
df.loc[c1, 'DATECOL'] += pd.offsets.DateOffset(years=2)
elif len(df[c2]) > 0:
df.loc[c2, 'DATECOL'] += pd.offsets.DateOffset(years=2)
elif len(df[c3]) > 0:
df.loc[c3, 'DATECOL'] += pd.offsets.DateOffset(years=3)
答案 2 :(得分:0)
如何在Akanksha Atrey提到的条件下使用np.where呢?
np.where((first_case_condition)== True,add_yars,np.where(second_contion)== True,...)