已解决
我将csv文件中的时间序列导入到数据框中,它有一个包含日期/时间和数据的列。所有值都作为对象导入。 现在我想重新索引日期/时间列(使其在一段时间内“标准化”为2.5分钟间隔以进一步合并,将NaNs填充为“最近”)。
我无法重新索引'对象'索引,但失败了:
range=pd.date_range(start='2017-10-01',end='2017-10-31',freq='2.5min')
df_b.set_index('datetimecolumn', inplace=True)
df_b=df_b.reindex(range, method='nearest')
ValueError:index必须是单调增加或减少
在to_datetime之前设置datetimecolumn也无济于事:
range=pd.date_range(start='2017-10-01',end='2017-10-31',freq='2.5min')
df_b['t_index']=pd.to_datetime(df_b['datetimecolumn'])
df_b.set_index('t_index', inplace=True)
df_b=df_b.reindex(range, method='nearest')
'datetimecolumn'的格式是例如“18.09.2017 07:28:33”。
提前致谢。
编辑:没有更多的东西......
import pandas as pd
import numpy as np
df_b=pd.read_csv('b.csv',delimiter=";")
数据格式:
[index] datetimecolumn data1 data2
0 18.09.2017 07:27:03 14,4 23333,222334
1 18.09.2017 07:29:33 13,1 23562,233223
2 18.09.2017 07:32:03 12,5 23234,244644
3 18.09.2017 07:34:33 13,5 23111,373561
4 18.09.2017 07:37:03 13,1 12311,373633
...
我想要的输出(来自数据帧的其他行的数据值,与其他一些数据帧合并):
[index] data1 data2 data3
01.10.2017 00:00:00 13.4 13333.222334 13.443
01.10.2017 00:02:30 12,1 25562.233223 13.434
01.10.2017 00:05:00 13.5 35111.373561 13.435
01.10.2017 00:07:30 13.5 35111.373561 13.434
01.10.2017 00:10:00 10.1 12311.373633 13.432
...
编辑2: “范围更正为”range1“,错误仍然存在。
我不明白,因为在另一个df中,我在它工作之前加载了。唯一的区别是单独列中的日期时间,我得到了代码:
df_boku['t_index']=pd.to_datetime(df_boku[['year','month','day']])+pd.to_timedelta(df_boku['hour MEZ'],unit='h')+pd.to_timedelta(df_boku['min'],unit='m')
df_boku.set_index('t_index', inplace=True)
df_boku=df_boku.reindex(range1,method='nearest')
其他任何相同的内容,没有错误(范围已修改为range1,未更改原始问题)。
编辑3:
索引的格式为:
DatetimeIndex(['2017-09-18 07:26:03', '2017-09-18 07:28:33',
'2017-09-18 07:31:03', '2017-09-18 07:33:33',
'2017-09-18 07:36:03', '2017-09-18 07:38:33',
'2017-09-18 07:41:03', '2017-09-18 07:43:33',
'2017-09-18 07:46:03', '2017-09-18 07:48:33',
...
'2017-11-18 08:31:03', '2017-11-18 08:33:33',
'2017-11-18 08:36:03', '2017-11-18 08:38:33',
'2017-11-18 08:41:03', '2017-11-18 08:43:33',
'2017-11-18 08:46:03', '2017-11-18 08:48:33',
'2017-11-18 08:51:03', '2017-11-18 08:53:33'],
dtype='datetime64[ns]', name='t_index', length=35172, freq=None)
这里的一切都是单调的。
完整追溯:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-9c25bfa7e198> in <module>()
10
11 range2=pd.date_range(start='2017-10-01',end='2017-10-31',freq='2.5min')
---> 12 df_b=df_b.reindex(range2, method='nearest')
13 #df_b.dtypes
14 #df_b.head()
C:\Anaconda\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
125 @wraps(func)
126 def wrapper(*args, **kwargs):
--> 127 return func(*args, **kwargs)
128
129 if not PY2:
C:\Anaconda\lib\site-packages\pandas\core\frame.py in reindex(self, *args, **kwargs)
2933 kwargs.pop('axis', None)
2934 kwargs.pop('labels', None)
-> 2935 return super(DataFrame, self).reindex(**kwargs)
2936
2937 @Appender(_shared_docs['reindex_axis'] % _shared_doc_kwargs)
C:\Anaconda\lib\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
3021 # perform the reindex on the axes
3022 return self._reindex_axes(axes, level, limit, tolerance, method,
-> 3023 fill_value, copy).__finalize__(self)
3024
3025 def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,
C:\Anaconda\lib\site-packages\pandas\core\frame.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
2868 if index is not None:
2869 frame = frame._reindex_index(index, method, copy, level,
-> 2870 fill_value, limit, tolerance)
2871
2872 return frame
C:\Anaconda\lib\site-packages\pandas\core\frame.py in _reindex_index(self, new_index, method, copy, level, fill_value, limit, tolerance)
2876 new_index, indexer = self.index.reindex(new_index, method=method,
2877 level=level, limit=limit,
-> 2878 tolerance=tolerance)
2879 return self._reindex_with_indexers({0: [new_index, indexer]},
2880 copy=copy, fill_value=fill_value,
C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in reindex(self, target, method, level, limit, tolerance)
2988 indexer = self.get_indexer(target, method=method,
2989 limit=limit,
-> 2990 tolerance=tolerance)
2991 else:
2992 if method is not None or limit is not None:
C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_indexer(self, target, method, limit, tolerance)
2691 indexer = self._get_fill_indexer(target, method, limit, tolerance)
2692 elif method == 'nearest':
-> 2693 indexer = self._get_nearest_indexer(target, limit, tolerance)
2694 else:
2695 if tolerance is not None:
C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in _get_nearest_indexer(self, target, limit, tolerance)
2761 tuples).
2762 """
-> 2763 left_indexer = self.get_indexer(target, 'pad', limit=limit)
2764 right_indexer = self.get_indexer(target, 'backfill', limit=limit)
2765
C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_indexer(self, target, method, limit, tolerance)
2689
2690 if method == 'pad' or method == 'backfill':
-> 2691 indexer = self._get_fill_indexer(target, method, limit, tolerance)
2692 elif method == 'nearest':
2693 indexer = self._get_nearest_indexer(target, limit, tolerance)
C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in _get_fill_indexer(self, target, method, limit, tolerance)
2719 else:
2720 indexer = self._get_fill_indexer_searchsorted(target, method,
-> 2721 limit)
2722 if tolerance is not None:
2723 indexer = self._filter_indexer_tolerance(target._values, indexer,
C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in _get_fill_indexer_searchsorted(self, target, method, limit)
2740 nonexact = (indexer == -1)
2741 indexer[nonexact] = self._searchsorted_monotonic(target[nonexact],
-> 2742 side)
2743 if side == 'left':
2744 # searchsorted returns "indices into a sorted array such that,
C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in _searchsorted_monotonic(self, label, side)
3548 return len(self) - pos
3549
-> 3550 raise ValueError('index must be monotonic increasing or decreasing')
3551
3552 def _get_loc_only_exact_matches(self, key):
ValueError: index must be monotonic increasing or decreasing
测试: - 索引中绝对没有NaT - 索引排序
答案 0 :(得分:0)
您的代码存在的问题是您调用date_range变量range
... range
是Python中的保留字,因此您必须避免将这样的名称提供给用户 - 定义变量。实际上,将range
更改为range_b
或其他内容会使您的代码完美运行!
import pandas as pd
df_b = pd.DataFrame([["18.09.2017 07:27:03", 14.4, 23333.222334],
["18.09.2017 07:29:33", 13.1, 23562.233223],
["18.09.2017 07:32:03", 12.5, 23234.244644],
["18.09.2017 07:34:33", 13.5, 23111.373561],
["18.09.2017 07:37:03", 13.1 ,12311.373633]],
columns = ["datetimecolumn","data1", "data2"])
range_b =pd.date_range(start='2017-10-01',end='2017-10-31',freq='2.5min')
df_b['t_index']=pd.to_datetime(df_b['datetimecolumn'])
df_b.set_index('t_index', inplace=True)
df_b=df_b.reindex(range_b, method='nearest')
df_b
datetimecolumn data1 data2
2017-10-01 00:00:00 18.09.2017 07:37:03 13.1 12311.373633
2017-10-01 00:02:30 18.09.2017 07:37:03 13.1 12311.373633
2017-10-01 00:05:00 18.09.2017 07:37:03 13.1 12311.373633
2017-10-01 00:07:30 18.09.2017 07:37:03 13.1 12311.373633
2017-10-01 00:10:00 18.09.2017 07:37:03 13.1 12311.373633
2017-10-01 00:12:30 18.09.2017 07:37:03 13.1 12311.373633
2017-10-01 00:15:00 18.09.2017 07:37:03 13.1 12311.373633
2017-10-01 00:17:30 18.09.2017 07:37:03 13.1 12311.373633
2017-10-01 00:20:00 18.09.2017 07:37:03 13.1 12311.373633
2017-10-01 00:22:30 18.09.2017 07:37:03 13.1 12311.373633
答案 1 :(得分:0)
正如错误消息所示,您的数据df_b
包含的索引值既不是单调增加也不是减少。换句话说,df_b['datetimecolumn']
未排序,但df.reindex(new_index, method='nearest')
要求对df
的索引进行排序,以便method='nearest'
能够正常工作。
<强>解决方案:强>
df_b = df_b.set_index().reindex(range, method='nearest')
请注意,与其他答案相反。在内置函数之后命名range
实际上并不重要。这不应该影响pandas代码,当然,它会覆盖内置函数range
,这就是为什么在内置函数之后命名变量是一个坏主意。
答案 2 :(得分:0)
部分“回答” -
我发现原因是什么,但还没有最终解决方案:
我在csv中有德国日期,这意味着日期和月份已切换。 to_datetime函数转换:
30.09.2017 23:56:03 2017-09-30 23:56:03
30.09.2017 23:58:33 2017-09-30 23:58:33
01.10.2017 00:01:03 2017-01-10 00:01:03
01.10.2017 00:03:33 2017-01-10 00:03:33
(注意9月到10月的变化)
编辑最终解决方案
df_b['t_index']=pd.to_datetime(df_b['datetimecolumn'],format='%d-%m-%Y %H:%M:%S')
这就是诀窍。切换德国日 - 月 - 年 - 月 - 日。这就是索引“未分类”调用to_datetime的原因。
此外 - “有趣”的事情是,如果那天是> 12(所以它不能是一个月)转换是正确的,并且文件在月中开始,所以改变不是看看.head()......
感谢您的帮助!