Python csv导入时间序列和reindex

时间:2018-06-16 13:05:27

标签: python pandas dataframe time-series

已解决

我将csv文件中的时间序列导入到数据框中,它有一个包含日期/时间和数据的列。所有值都作为对象导入。 现在我想重新索引日期/时间列(使其在一段时间内“标准化”为2.5分钟间隔以进一步合并,将NaNs填充为“最近”)。

我无法重新索引'对象'索引,但失败了:

range=pd.date_range(start='2017-10-01',end='2017-10-31',freq='2.5min')
df_b.set_index('datetimecolumn', inplace=True)
df_b=df_b.reindex(range, method='nearest')

ValueError:index必须是单调增加或减少

在to_datetime之前设置datetimecolumn也无济于事:

range=pd.date_range(start='2017-10-01',end='2017-10-31',freq='2.5min')
df_b['t_index']=pd.to_datetime(df_b['datetimecolumn'])
df_b.set_index('t_index', inplace=True)
df_b=df_b.reindex(range, method='nearest')

'datetimecolumn'的格式是例如“18.09.2017 07:28:33”。

提前致谢。

编辑:没有更多的东西......

import pandas as pd
import numpy as np
df_b=pd.read_csv('b.csv',delimiter=";")

数据格式:

[index] datetimecolumn      data1   data2
0       18.09.2017 07:27:03 14,4    23333,222334
1       18.09.2017 07:29:33 13,1    23562,233223
2       18.09.2017 07:32:03 12,5    23234,244644
3       18.09.2017 07:34:33 13,5    23111,373561
4       18.09.2017 07:37:03 13,1    12311,373633
...

我想要的输出(来自数据帧的其他行的数据值,与其他一些数据帧合并):

[index]             data1   data2         data3
01.10.2017 00:00:00 13.4    13333.222334  13.443
01.10.2017 00:02:30 12,1    25562.233223  13.434
01.10.2017 00:05:00 13.5    35111.373561  13.435
01.10.2017 00:07:30 13.5    35111.373561  13.434
01.10.2017 00:10:00 10.1    12311.373633  13.432
...

编辑2: “范围更正为”range1“,错误仍然存​​在。

我不明白,因为在另一个df中,我在它工作之前加载了。唯一的区别是单独列中的日期时间,我得到了代码:

df_boku['t_index']=pd.to_datetime(df_boku[['year','month','day']])+pd.to_timedelta(df_boku['hour MEZ'],unit='h')+pd.to_timedelta(df_boku['min'],unit='m') 
df_boku.set_index('t_index', inplace=True)
df_boku=df_boku.reindex(range1,method='nearest')

其他任何相同的内容,没有错误(范围已修改为range1,未更改原始问题)。

编辑3:

索引的格式为:

DatetimeIndex(['2017-09-18 07:26:03', '2017-09-18 07:28:33',
               '2017-09-18 07:31:03', '2017-09-18 07:33:33',
               '2017-09-18 07:36:03', '2017-09-18 07:38:33',
               '2017-09-18 07:41:03', '2017-09-18 07:43:33',
               '2017-09-18 07:46:03', '2017-09-18 07:48:33',
               ...
               '2017-11-18 08:31:03', '2017-11-18 08:33:33',
               '2017-11-18 08:36:03', '2017-11-18 08:38:33',
               '2017-11-18 08:41:03', '2017-11-18 08:43:33',
               '2017-11-18 08:46:03', '2017-11-18 08:48:33',
               '2017-11-18 08:51:03', '2017-11-18 08:53:33'],
              dtype='datetime64[ns]', name='t_index', length=35172, freq=None)

这里的一切都是单调的。

完整追溯:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-9c25bfa7e198> in <module>()
     10 
     11 range2=pd.date_range(start='2017-10-01',end='2017-10-31',freq='2.5min')
---> 12 df_b=df_b.reindex(range2, method='nearest')
     13 #df_b.dtypes
     14 #df_b.head()

C:\Anaconda\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    125         @wraps(func)
    126         def wrapper(*args, **kwargs):
--> 127             return func(*args, **kwargs)
    128 
    129         if not PY2:

C:\Anaconda\lib\site-packages\pandas\core\frame.py in reindex(self, *args, **kwargs)
   2933         kwargs.pop('axis', None)
   2934         kwargs.pop('labels', None)
-> 2935         return super(DataFrame, self).reindex(**kwargs)
   2936 
   2937     @Appender(_shared_docs['reindex_axis'] % _shared_doc_kwargs)

C:\Anaconda\lib\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
   3021         # perform the reindex on the axes
   3022         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 3023                                   fill_value, copy).__finalize__(self)
   3024 
   3025     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

C:\Anaconda\lib\site-packages\pandas\core\frame.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   2868         if index is not None:
   2869             frame = frame._reindex_index(index, method, copy, level,
-> 2870                                          fill_value, limit, tolerance)
   2871 
   2872         return frame

C:\Anaconda\lib\site-packages\pandas\core\frame.py in _reindex_index(self, new_index, method, copy, level, fill_value, limit, tolerance)
   2876         new_index, indexer = self.index.reindex(new_index, method=method,
   2877                                                 level=level, limit=limit,
-> 2878                                                 tolerance=tolerance)
   2879         return self._reindex_with_indexers({0: [new_index, indexer]},
   2880                                            copy=copy, fill_value=fill_value,

C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in reindex(self, target, method, level, limit, tolerance)
   2988                     indexer = self.get_indexer(target, method=method,
   2989                                                limit=limit,
-> 2990                                                tolerance=tolerance)
   2991                 else:
   2992                     if method is not None or limit is not None:

C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_indexer(self, target, method, limit, tolerance)
   2691             indexer = self._get_fill_indexer(target, method, limit, tolerance)
   2692         elif method == 'nearest':
-> 2693             indexer = self._get_nearest_indexer(target, limit, tolerance)
   2694         else:
   2695             if tolerance is not None:

C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in _get_nearest_indexer(self, target, limit, tolerance)
   2761         tuples).
   2762         """
-> 2763         left_indexer = self.get_indexer(target, 'pad', limit=limit)
   2764         right_indexer = self.get_indexer(target, 'backfill', limit=limit)
   2765 

C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_indexer(self, target, method, limit, tolerance)
   2689 
   2690         if method == 'pad' or method == 'backfill':
-> 2691             indexer = self._get_fill_indexer(target, method, limit, tolerance)
   2692         elif method == 'nearest':
   2693             indexer = self._get_nearest_indexer(target, limit, tolerance)

C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in _get_fill_indexer(self, target, method, limit, tolerance)
   2719         else:
   2720             indexer = self._get_fill_indexer_searchsorted(target, method,
-> 2721                                                           limit)
   2722         if tolerance is not None:
   2723             indexer = self._filter_indexer_tolerance(target._values, indexer,

C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in _get_fill_indexer_searchsorted(self, target, method, limit)
   2740         nonexact = (indexer == -1)
   2741         indexer[nonexact] = self._searchsorted_monotonic(target[nonexact],
-> 2742                                                          side)
   2743         if side == 'left':
   2744             # searchsorted returns "indices into a sorted array such that,

C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in _searchsorted_monotonic(self, label, side)
   3548             return len(self) - pos
   3549 
-> 3550         raise ValueError('index must be monotonic increasing or decreasing')
   3551 
   3552     def _get_loc_only_exact_matches(self, key):

ValueError: index must be monotonic increasing or decreasing

测试: - 索引中绝对没有NaT - 索引排序

3 个答案:

答案 0 :(得分:0)

您的代码存在的问题是您调用date_range变量range ... range是Python中的保留字,因此您必须避免将这样的名称提供给用户 - 定义变量。实际上,将range更改为range_b或其他内容会使您的代码完美运行!

import pandas as pd
df_b = pd.DataFrame([["18.09.2017 07:27:03", 14.4, 23333.222334],
                    ["18.09.2017 07:29:33", 13.1,    23562.233223],
                    ["18.09.2017 07:32:03", 12.5,  23234.244644],
                    ["18.09.2017 07:34:33", 13.5,   23111.373561],
                    ["18.09.2017 07:37:03", 13.1 ,12311.373633]],
                            columns = ["datetimecolumn","data1", "data2"])

range_b =pd.date_range(start='2017-10-01',end='2017-10-31',freq='2.5min')
df_b['t_index']=pd.to_datetime(df_b['datetimecolumn'])
df_b.set_index('t_index', inplace=True)
df_b=df_b.reindex(range_b, method='nearest')
df_b
    datetimecolumn  data1   data2
2017-10-01 00:00:00 18.09.2017 07:37:03 13.1    12311.373633
2017-10-01 00:02:30 18.09.2017 07:37:03 13.1    12311.373633
2017-10-01 00:05:00 18.09.2017 07:37:03 13.1    12311.373633
2017-10-01 00:07:30 18.09.2017 07:37:03 13.1    12311.373633
2017-10-01 00:10:00 18.09.2017 07:37:03 13.1    12311.373633
2017-10-01 00:12:30 18.09.2017 07:37:03 13.1    12311.373633
2017-10-01 00:15:00 18.09.2017 07:37:03 13.1    12311.373633
2017-10-01 00:17:30 18.09.2017 07:37:03 13.1    12311.373633
2017-10-01 00:20:00 18.09.2017 07:37:03 13.1    12311.373633
2017-10-01 00:22:30 18.09.2017 07:37:03 13.1    12311.373633

答案 1 :(得分:0)

正如错误消息所示,您的数据df_b包含的索引值既不是单调增加也不是减少。换句话说,df_b['datetimecolumn']未排序,但df.reindex(new_index, method='nearest')要求对df的索引进行排序,以便method='nearest'能够正常工作。

<强>解决方案:

df_b = df_b.set_index().reindex(range, method='nearest')

请注意,与其他答案相反。在内置函数之后命名range实际上并不重要。这不应该影响pandas代码,当然,它会覆盖内置函数range,这就是为什么在内置函数之后命名变量是一个坏主意。

答案 2 :(得分:0)

部分“回答” -

我发现原因是什么,但还没有最终解决方案:

我在csv中有德国日期,这意味着日期和月份已切换。 to_datetime函数转换:

30.09.2017 23:56:03 2017-09-30 23:56:03
30.09.2017 23:58:33 2017-09-30 23:58:33
01.10.2017 00:01:03 2017-01-10 00:01:03
01.10.2017 00:03:33 2017-01-10 00:03:33

(注意9月到10月的变化)

编辑最终解决方案

df_b['t_index']=pd.to_datetime(df_b['datetimecolumn'],format='%d-%m-%Y %H:%M:%S')

这就是诀窍。切换德国日 - 月 - 年 - 月 - 日。这就是索引“未分类”调用to_datetime的原因。

此外 - “有趣”的事情是,如果那天是> 12(所以它不能是一个月)转换是正确的,并且文件在月中开始,所以改变不是看看.head()......

感谢您的帮助!