Question

我有一个很大的时间序列数据集，可以测量一段时间内的温度。每行都有一个日期时间和相应的温度。我想弄清楚它在特定温度范围内的时间百分比。

我想遍历此数据框，并每天计算10到20度之间的温度百分比。这应该会产生一个新的数据框，该数据框每天都有该设备处于范围内的百分比。关键是要查看范围百分比每天如何变化，而不仅仅是计算整个数据帧的范围百分比。

与我尝试过的方法相比，我如何能更有效地实现这一目标？

df1 = df[(df['date'] > '2019-01-01') & (df['date'] <= '2019-01-02')]
df2 = df[(df['date'] > '2019-01-02') & (df['date'] <= '2019-01-03')]
df3 = df[(df['date'] > '2019-01-03') & (df['date'] <= '2019-01-04')]
df4 = df[(df['date'] > '2019-01-04') & (df['date'] <= '2019-01-05')]
df5 = df[(df['date'] > '2019-01-05') & (df['date'] <= '2019-01-06')]
df6 = df[(df['date'] > '2019-01-06') & (df['date'] <= '2019-01-07')]
df7 = df[(df['date'] > '2019-01-07') & (df['date'] <= '2019-01-08')]

condition1 = df1[(df1.temp >= 10.0) & (df1.temp <=20.0)]
condition2 = df2[(df2.temp >= 10.0) & (df2.temp <=20.0)]
condition3 = df3[(df3.temp >= 10.0) & (df3.temp <=20.0)]
condition4 = df4[(df4.temp >= 10.0) & (df4.temp <=20.0)]
condition5 = df5[(df5.temp >= 10.0) & (df5.temp <=20.0)]
condition6 = df6[(df6.temp >= 10.0) & (df6.temp <=20.0)]
condition7 = df7[(df7.temp >= 10.0) & (df7.temp <=20.0)]

percentage1 = (len(condition1)/len(df1))*100
percentage2 = (len(condition2)/len(df2))*100
percentage3 = (len(condition3)/len(df3))*100
percentage4 = (len(condition4)/len(df4))*100
percentage5 = (len(condition5)/len(df5))*100
percentage6 = (len(condition6)/len(df6))*100
percentage7 = (len(condition7)/len(df7))*100

Answer 1

假设您具有同样采样的数据，则可以尝试以下操作：

df2 = df[(df['temperature']>10)&(df['temperature']<20)]['temperature'].resample('1d').count().divide(df['temperature'].resample('1d').count())

Answer 2

类似的东西可能对您有用：

df['date']=pd.to_datetime(df['date']) #not necessary if your dates are already in datetime format
df.set_index('date',inplace=True) #make date the index

all_days=df.index.normalize().unique() #get all unique days in timeseries

df2=pd.DataFrame(columns=['date','percent']) #create new df to store results
df2['date']=all_days #make date column equal to the unique days
df2.set_index('date',inplace=True) #make date column the index

for i,row in df2.iterrows(): #iterate through each row of df2
    iloc = df2.index.get_loc(i) #get index location
    daily_df = df[(df.index >= df2.index[iloc]) & (df.index < df2.index[iloc+1])] #get reduced df for that day (assuming it starts at midnight and ends at 23:59:59)
    total_count = daily_df.shape[0] #number of temp readings that day
    above_count = daily_df[(daily_df['temp'] >= 10) & (daily_df['temp'] <= 20)].values.shape[0] #number of temp readings between 10 and 20
    df2.iloc[iloc]['percent']=100*above_count/total_count #assign percent column the percentage of values between 10 and 20

肯定有一种方法可以用我不知道的pandas函数精炼代码。但这是一个好的开始

您将不得不处理最后一天，因为它将没有结束日期

编辑

将daily_df行替换为：

daily_df = df[df.index.normalize() == df2.index[iloc]]

并且不会在最后一个日期崩溃

每天特定范围内的时间序列百分比

2 个答案: