在熊猫中创建每周个人资料

时间:2020-09-16 10:31:10

标签: python pandas dataframe datetime

我有这个df:

df = pd.DataFrame({"on": [1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0]}, 
              index=pd.date_range(start = "2020-04-09 6:45", periods = 30, freq = '8H'))

,并希望为df['on']列创建每周配置文件。

我可以如下输入工作日和时间:

df['day_name'] = df.index.day_name() 
df['time'] = df.index.time

并设置此df:

                    on  day_name    time
2020-04-09 06:45:00 1   Thursday    06:45:00
2020-04-09 14:45:00 1   Thursday    14:45:00
2020-04-09 22:45:00 1   Thursday    22:45:00
2020-04-10 06:45:00 1   Friday      06:45:00
2020-04-10 14:45:00 1   Friday      14:45:00
2020-04-10 22:45:00 0   Friday      22:45:00
2020-04-11 06:45:00 0   Saturday    06:45:00
2020-04-11 14:45:00 0   Saturday    14:45:00
2020-04-11 22:45:00 1   Saturday    22:45:00
2020-04-12 06:45:00 0   Sunday      06:45:00
2020-04-12 14:45:00 0   Sunday      14:45:00
2020-04-12 22:45:00 1   Sunday      22:45:00
2020-04-13 06:45:00 1   Monday      06:45:00
2020-04-13 14:45:00 0   Monday      14:45:00
2020-04-13 22:45:00 0   Monday      22:45:00
2020-04-14 06:45:00 0   Tuesday     06:45:00
2020-04-14 14:45:00 0   Tuesday     14:45:00
2020-04-14 22:45:00 1   Tuesday     22:45:00
2020-04-15 06:45:00 0   Wednesday   06:45:00
2020-04-15 14:45:00 1   Wednesday   14:45:00
2020-04-15 22:45:00 1   Wednesday   22:45:00
2020-04-16 06:45:00 0   Thursday    06:45:00
2020-04-16 14:45:00 0   Thursday    14:45:00
2020-04-16 22:45:00 0   Thursday    22:45:00
2020-04-17 06:45:00 1   Friday      06:45:00
2020-04-17 14:45:00 1   Friday      14:45:00
2020-04-17 22:45:00 1   Friday      22:45:00
2020-04-18 06:45:00 0   Saturday    06:45:00
2020-04-18 14:45:00 0   Saturday    14:45:00
2020-04-18 22:45:00 0   Saturday    22:45:00

有人可以帮助我如何获取某个时间段(例如,星期二22:45)df['on'] == 1列的概率吗?这最好是整个一周的课程。 (在本例中,星期四22:45的概率是:1/2)

非常感谢:)

2 个答案:

答案 0 :(得分:2)

我考虑了您的问题中的两个选择:

1。每周的比例: 我计算了某天的'on'==1列的比率(概率):

每个工作日的比率:

df_2=pd.DataFrame()
df_2['Ones']=df[df['on']==1]['day_name'].value_counts()
df_2['All']=df['day_name'].value_counts()
df_2['Ratio']=df_2['Ones']/df_2['All']
df_2

这是输出:

        Ones    All Ratio
Friday     5    6   0.833333
Thursday   3    6   0.500000
Wednesday  2    3   0.666667
Monday     1    3   0.333333
Saturday   1    6   0.166667
Sunday     1    3   0.333333
Tuesday    1    3   0.333333

2。每天每次的比率: 在这里,我计算了在“ x”的第“ y”天的第“ x”天,列"on"为1:

每周工作日的比率:

df_3 = df.groupby(['day_name', 'time']).agg({'on': 'count'})
df_3['ones'] = df.groupby(['day_name', 'time']).agg({'on': 'sum'})
df_3['Ratio'] = df_3['ones']/df_3['on']
df_3

这是输出:

                  on    ones    Ratio
day_name    time            
Friday  06:45:00    2   2   1.0
        14:45:00    2   2   1.0
        22:45:00    2   1   0.5
Monday  06:45:00    1   1   1.0
        14:45:00    1   0   0.0
        22:45:00    1   0   0.0
Saturday06:45:00    2   0   0.0
        14:45:00    2   0   0.0
        22:45:00    2   1   0.5
Sunday  06:45:00    1   0   0.0
        14:45:00    1   0   0.0
        22:45:00    1   1   1.0
Thursday06:45:00    2   1   0.5
        14:45:00    2   1   0.5
        22:45:00    2   1   0.5
Tuesday 06:45:00    1   0   0.0
        14:45:00    1   0   0.0
        22:45:00    1   1   1.0
Wednesday06:45:00   1   0   0.0
        14:45:00    1   1   1.0
        22:45:00    1   1   1.0

概率绘图

要回答您的请求,我必须对上面的代码做一些修改:我需要按要求对索引进行排序,将它们合并为一个索引,然后将索引转换为字符串,以避免绘图中出现一些问题。这是新代码:

#Ratio每周的日期和时间

days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df_3 = df.groupby(['day_name', 'time']).agg({'on': 'count'})
df_3['ones'] = df.groupby(['day_name', 'time']).agg({'on': 'sum'})
df_3['Ratio'] = df_3['ones']/df_3['on']
df_3 = df_3.reindex(days, level=0)
df_3.index = [str(i) for i in (df_3.index.map('{0[0]} : {0[1]}'.format))]
df_3

现在我们进行了前面的验证,我们可以轻松地绘制比率:

#Graph的概率

import matplotlib.pyplot as plt
plt.figure()

plt.plot(df_3.index, df_3['Ratio'])
plt.xlabel('Date')
plt.xticks(rotation=90)
plt.title('Probability of "on"=1')

以下是图形:

enter image description here

答案 1 :(得分:1)

我相信您只需要path = f'D:\\YT_Files\\{video_title}.mp3'

mean
相关问题