Python:根据时间数据序列计算中值

时间:2018-08-21 09:08:46

标签: python pandas numpy

我需要从Python时间数据数组计算中值。我可以用数字值来管理它,但以日期时间格式却确实很麻烦。有人可以理解和解释如何执行此操作。需要进行一些数据格式转换,但我不知道如何进行转换。 哪一种(numpy或pandas)是更合适,更有效的中位数计算方法?

>>> import pandas as pd
>>> import numpy as np

创建数据框:

>>> df1 = pd.DataFrame({'Value': [1, 2, 3]})
>>> df2 = pd.DataFrame({'Value': ['02:00:00', '03:00:00', '04:00:00']})

数字中位数:

>>> numpy_numeric_median = np.median(df1)
>>> print(numpy_numeric_median)
2.0

熊猫数字中位数:

>>> pandas_numeric_median = df1['Value'].median()
>>> print(pandas_numeric_median)
2.0

脾气暴躁的时间中位数:

>>> numpy_time_median = np.median(df2)
TypeError: unsupported operand type(s) for /: 'str' and 'int'

>>> df2_datetime_format = np.array(pd.to_datetime(df2['Value']), dtype=np.datetime64)
array(['2018-08-21T02:00:00.000000000', '2018-08-21T03:00:00.000000000', '2018-08-21T04:00:00.000000000'], dtype='datetime64[ns]')

>>> numpy_time_median = np.median(df2_datetime_format)
TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')

熊猫时间中位数:

>>> pandas_time_median = df2['Value'].median()
TypeError: could not convert string to float: '04:00:00'

>>> df2_datetime_format = pd.to_datetime(df2['Value'])
0   2018-08-21 02:00:00
1   2018-08-21 03:00:00
2   2018-08-21 04:00:00
Name: Value, dtype: datetime64[ns]

>>> pandas_time_median = df2_datetime_format['Value'].median()
TypeError: an integer is required

>>> pandas_time_median = df2_datetime_format.median()
TypeError: reduction operation 'median' not allowed for this dtype

PS!先前的线程'median of panda datetime64 column'不能解决我的问题,因为它也给了我错误。实际上,即使中值演算是如此繁琐,声称python为主要而强大的Datascience工具也很奇怪。

>>> median = math.floor(df2['Value'].astype('int64').median())
ValueError: invalid literal for int() with base 10: '02:00:00'

>>> median = df2['Value'].astype('datetime64[ns]').quantile(.5)
Timestamp('2018-08-21 03:00:00') => right answer but it's only usable when the length of the data frame is even.

0 个答案:

没有答案