I'd like get a percentage of the occurrences of speed data falling into a range as a percentage. As an example, 5% of the speed data is between 0 and 5, 10% is between 5 and 10, etc. I'd also like the ability to resample the output into any frequency (entire period, daily, monthly, etc)
I have a DataFrame that looks like this:
df = pd.DataFrame({'id': '1234',
'datetime': pd.date_range('2017-01-01', '2018-01-01', freq='H'),
'speed': np.random.randint(0, 5000, df.shape[0])})
df['speed'] = df['speed'] / 100.0
speedintervals = [0,3,5,9,15,21]
frequency = 'D' # for daily aggregation of data
# or frequency = 'P' for entire period
DataFrame looks like this:
datetime id speed
0 2017-01-01 00:00:00 1234 17.08
1 2017-01-01 01:00:00 1234 16.30
2 2017-01-01 02:00:00 1234 12.74
3 2017-01-01 03:00:00 1234 39.89
4 2017-01-01 04:00:00 1234 34.33
5 2017-01-01 05:00:00 1234 22.76
6 2017-01-01 06:00:00 1234 13.72
...
I'm imagining that I'd set datetime to index and do a resample of sorts, but not sure how to build out the data. Ultimately, I want the data to look like this:
For entire period:
id start_date end_date 0<=3 3<=9 9<=15 15<=21 >21
1234 1/1/17 0:00 1/1/18 23:00 0.49 0.13 0.18 0.17 0.00
For daily frequency:
id periodEnd 0<=3 3<=9 9<=15 15<=21 >21
1234 1/1/18 0.49 0.13 0.18 0.17 0.00
1234 1/2/18 0.50 0.14 0.17 0.16 0.00
1234 1/3/18 0.25 0.10 0.25 0.25 0.15
...
any thoughts?
答案 0 :(得分:1)
Here one way to do it.
speedintervals = [0,3,5,9,15,21,100]
df["interval"] = pd.cut(df["speed"],bins=speedintervals)
result = (df.groupby([pd.Grouper(key="datetime",freq="D"),"interval"])["interval"].count()
.unstack(0).T.fillna(0)
)
You could use a pivot table instead of a groupby, but it's better to use group for dates.
If you want the normalized result you can do
result.div(result.sum(axis=1),axis="rows")
For the whole time period
pd.cut(df["speed"],bins=speedintervals).value_counts()