我需要从Python时间数据数组计算中值。我可以用数字值来管理它,但以日期时间格式却确实很麻烦。有人可以理解和解释如何执行此操作。需要进行一些数据格式转换,但我不知道如何进行转换。 哪一种(numpy或pandas)是更合适,更有效的中位数计算方法?
>>> import pandas as pd
>>> import numpy as np
创建数据框:
>>> df1 = pd.DataFrame({'Value': [1, 2, 3]})
>>> df2 = pd.DataFrame({'Value': ['02:00:00', '03:00:00', '04:00:00']})
数字中位数:
>>> numpy_numeric_median = np.median(df1)
>>> print(numpy_numeric_median)
2.0
熊猫数字中位数:
>>> pandas_numeric_median = df1['Value'].median()
>>> print(pandas_numeric_median)
2.0
脾气暴躁的时间中位数:
>>> numpy_time_median = np.median(df2)
TypeError: unsupported operand type(s) for /: 'str' and 'int'
>>> df2_datetime_format = np.array(pd.to_datetime(df2['Value']), dtype=np.datetime64)
array(['2018-08-21T02:00:00.000000000', '2018-08-21T03:00:00.000000000', '2018-08-21T04:00:00.000000000'], dtype='datetime64[ns]')
>>> numpy_time_median = np.median(df2_datetime_format)
TypeError: ufunc add cannot use operands with types dtype('<M8[ns]') and dtype('<M8[ns]')
熊猫时间中位数:
>>> pandas_time_median = df2['Value'].median()
TypeError: could not convert string to float: '04:00:00'
>>> df2_datetime_format = pd.to_datetime(df2['Value'])
0 2018-08-21 02:00:00
1 2018-08-21 03:00:00
2 2018-08-21 04:00:00
Name: Value, dtype: datetime64[ns]
>>> pandas_time_median = df2_datetime_format['Value'].median()
TypeError: an integer is required
>>> pandas_time_median = df2_datetime_format.median()
TypeError: reduction operation 'median' not allowed for this dtype
PS!先前的线程'median of panda datetime64 column'不能解决我的问题,因为它也给了我错误。实际上,即使中值演算是如此繁琐,声称python为主要而强大的Datascience工具也很奇怪。
>>> median = math.floor(df2['Value'].astype('int64').median())
ValueError: invalid literal for int() with base 10: '02:00:00'
>>> median = df2['Value'].astype('datetime64[ns]').quantile(.5)
Timestamp('2018-08-21 03:00:00') => right answer but it's only usable when the length of the data frame is even.