基于日期的多条件计数器

时间:2019-03-25 15:49:03

标签: pandas pandas-groupby data-science np

我有这个数据框

df:
    entrance   leaving        counter
1   2012-07-01  NaT             NaN
2   2013-03-15  NaT             NaN
3   2013-03-15  2013-04-15      NaN
4   2014-06-01  NaT             NaN
5   2014-06-01  NaT             NaN

我想要一个考虑两列日期和entrance日期上的递增和{{1​​}}日期上的递减的计数器。此外,以下leaving列也应增加一个月。 所需的输出应为:

date

我已经在此行中根据df_new: date counter 2012-07 1 2012-08 1 ... ... 2013-03 2 ... ... 2014-06 4 进行递增,但是如果`df.entrance.notnull()'不能使用entrance进行递减。

np.where()

1 个答案:

答案 0 :(得分:0)

我相信您的问题未指定。计数器不能共享原始DF的索引。以下是原因的示例:

    # Lets assume this is the DF:
    entrance   leaving        counter
1   2012-07-01  NaT             1
2   2013-03-15  NaT             2
3   2013-03-15  2013-06-15      2 ?
4   2013-06-01  NaT             3 or 4? Depends if you count the exit in prev row or not

无论哪种方式,以下是解决方案:

# Load Data
s = '''     entrance   leaving        counter
1   2012-07-01  NaT             NaN
2   2013-03-15  NaT             NaN
3   2013-03-15  2013-04-15      NaN
4   2014-06-01  NaT             NaN
5   2014-06-01  NaT             NaN'''

df = pd.DataFrame.from_csv(io.StringIO(s), sep='\s+')
df['leaving']= pd.to_datetime(df['leaving'])
df['entrance']= pd.to_datetime(df['entrance'])

不会遵循原始索引的明确解决方案:

# Counter
counter = pd.Series(1, df['entrance'].dropna()).subtract(pd.Series(1, df['leaving'].dropna()), fill_value=0).cumsum()

# If you want it monthly
counter.resample('M').last().ffill()

一种解决方案,该解决方案可以保留原始索引,但有些含糊:

count_df = df.notna().cumsum()
df['counter'] = count_df['entrance'] - count_df['leaving']
相关问题