转换类别,start_time,end_time用于在pandas中绘图的DateFrame

时间:2016-06-01 22:01:22

标签: python pandas

我有一个pandas DataFrame:

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 32656 entries, 94418 to 2
Data columns (total 8 columns):
customer_id             32656 non-null object
session_id              32656 non-null int64
start                   32656 non-null datetime64[ns, America/Los_Angeles]
end                     32656 non-null datetime64[ns, America/Los_Angeles]
length                  32656 non-null timedelta64[ns]
category                32656 non-null object
rounded_start           32656 non-null datetime64[ns, America/Los_Angeles]
rounded_end             32656 non-null datetime64[ns, America/Los_Angeles]
dtypes: datetime64[ns, America/Los_Angeles](4), int64(1), object(2), timedelta64[ns](1)
memory usage: 2.2+ MB

我还创建了一个DateTimeIndex:

rng = pd.date_range(df['rounded_start'].min(), end=df['rounded_start'].max(), freq='5Min')

如何将两个数据集绑定在一起,以便我可以在x轴上绘制范围内的每个点,并显示在此期间包含的类别数量?

1 个答案:

答案 0 :(得分:0)

我怀疑这可行,但我还没有验证。

df_count = pd.DataFrame(index=rng)

def count_cats(x, df):
    date = x.name[0]
    condition1 = df.start <= date
    condition2 = df.end >= date
    df_slice = df.loc[condition1 & condition2, 'category']
    return pd.Series([df_slice.unique().size], index=['CountCats'])

df_count = df_count.apply(lambda x: count_cats(x, df))