我有一个pandas DataFrame:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 32656 entries, 94418 to 2
Data columns (total 8 columns):
customer_id 32656 non-null object
session_id 32656 non-null int64
start 32656 non-null datetime64[ns, America/Los_Angeles]
end 32656 non-null datetime64[ns, America/Los_Angeles]
length 32656 non-null timedelta64[ns]
category 32656 non-null object
rounded_start 32656 non-null datetime64[ns, America/Los_Angeles]
rounded_end 32656 non-null datetime64[ns, America/Los_Angeles]
dtypes: datetime64[ns, America/Los_Angeles](4), int64(1), object(2), timedelta64[ns](1)
memory usage: 2.2+ MB
我还创建了一个DateTimeIndex:
rng = pd.date_range(df['rounded_start'].min(), end=df['rounded_start'].max(), freq='5Min')
如何将两个数据集绑定在一起,以便我可以在x轴上绘制范围内的每个点,并显示在此期间包含的类别数量?
答案 0 :(得分:0)
我怀疑这可行,但我还没有验证。
df_count = pd.DataFrame(index=rng)
def count_cats(x, df):
date = x.name[0]
condition1 = df.start <= date
condition2 = df.end >= date
df_slice = df.loc[condition1 & condition2, 'category']
return pd.Series([df_slice.unique().size], index=['CountCats'])
df_count = df_count.apply(lambda x: count_cats(x, df))