我正在使用Anaconda附带的Pandas库,使用python 2.7.9。
我的问题是双重的。
我有几个具有日期和时间字段的数据集,但不幸的是,创建它们的工具没有一致地标记日期,所以它们都是DD / MM / YYYY格式,但仪器似乎随机地停止了大约一半日期的月和日的领先零。 Pandas无法正确读取它们(来自excel文件),并且由于数据集从4月10日开始,它始终在2014-10-04开始,其间有未转换的日期(当天超过12时),然后开始将它们读作考虑到输入日期,这是有意义的YYYY-MM-DD。有没有办法强制Pandas正确读取这些日期,并连接日期和时间字段并将其用作索引,而不是分配数字?我尝试为Date字段创建并插入转换器函数以正确格式化日期,但由于某种原因它在之后应用 Pandas已经错误地读取了日期,因此格式不正确。
由于我想将这些数据作为时间序列索引,我所做的只是创建一个日期/时间范围,然后将其设置为DataFrame的索引,该工作正常。除此之外,对于此数据集,有两天的数据,仪器显然开始以每分钟的样本频率开始采集数据,而不是每10分钟采样一次。有没有办法分配索引并强制它只保留匹配的记录?如果做不到这一点,我一直试图尝试仅在分钟结束为0的时候查询DataFrame,或者专门删除那些记录,但根本没有成功。我真的不知道该怎么做。
Here's a link to a csv with sample dates.
除此之外,我试过了:
In[168]: ddata = ddata[str(ddata[' Time'])[:5].endswith('0')]
Traceback (most recent call last):
File "C:\Users\Tom\Anaconda\lib\site-packages\IPython\core\interactiveshell.py", line 2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-156-098b3e02871f>", line 1, in <module>
ddata = ddata[str(ddata[' Time'])[:5].endswith('0')]
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\core\frame.py", line 1678, in __getitem__
return self._getitem_column(key)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\core\frame.py", line 1685, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\core\generic.py", line 1052, in _get_item_cache
values = self._data.get(item)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\core\internals.py", line 2565, in get
loc = self.items.get_loc(item)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\core\index.py", line 1181, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "index.pyx", line 129, in pandas.index.IndexEngine.get_loc (pandas\index.c:3656)
File "index.pyx", line 149, in pandas.index.IndexEngine.get_loc (pandas\index.c:3534)
File "hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:11911)
File "hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:11864)
KeyError: False
In[169]: ddata1 = ddata.query('Time[4] == 0')
Traceback (most recent call last):
File "C:\Users\Tom\Anaconda\lib\site-packages\IPython\core\interactiveshell.py", line 2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-166-48cd98cf78bd>", line 1, in <module>
ddata1 = ddata.query('Time[4] == 0')
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\core\frame.py", line 1816, in query
res = self.eval(expr, **kwargs)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\core\frame.py", line 1868, in eval
return _eval(expr, **kwargs)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\eval.py", line 235, in eval
ret = eng_inst.evaluate()
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\engines.py", line 69, in evaluate
self.result_type, self.aligned_axes = _align(self.expr.terms)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\align.py", line 136, in _align
typ, axes = _align_core(terms)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\align.py", line 54, in wrapper
return _result_type_many(*term_values), None
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\common.py", line 17, in _result_type_many
return np.result_type(*arrays_and_dtypes)
TypeError: data type not understood
In[170]: ddata1 = ddata.query('str(Time)[4] == 0')
Traceback (most recent call last):
File "C:\Users\Tom\Anaconda\lib\site-packages\IPython\core\interactiveshell.py", line 2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-167-452d91f45daf>", line 1, in <module>
ddata1 = ddata.query('str(Time)[4] == 0')
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\core\frame.py", line 1816, in query
res = self.eval(expr, **kwargs)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\core\frame.py", line 1868, in eval
return _eval(expr, **kwargs)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\eval.py", line 230, in eval
truediv=truediv)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 635, in __init__
self.terms = self.parse()
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 652, in parse
return self._visitor.visit(self.expr)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 314, in visit
return visitor(node, **kwargs)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 320, in visit_Module
return self.visit(expr, **kwargs)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 314, in visit
return visitor(node, **kwargs)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 323, in visit_Expr
return self.visit(node.value, **kwargs)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 314, in visit
return visitor(node, **kwargs)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 560, in visit_Compare
return self.visit(binop)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 314, in visit
return visitor(node, **kwargs)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 404, in visit_BinOp
op, op_class, left, right = self._possibly_transform_eq_ne(node)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 355, in _possibly_transform_eq_ne
left = self.visit(node.left, side='left')
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 314, in visit
return visitor(node, **kwargs)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 440, in visit_Subscript
value = self.visit(node.value)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 314, in visit
return visitor(node, **kwargs)
File "C:\Users\Tom\Anaconda\lib\site-packages\pandas\computation\expr.py", line 205, in f
"implemented".format(node_name))
NotImplementedError: 'Call' nodes are not implemented
答案 0 :(得分:1)
我在你链接到的csv上尝试了这个,它似乎对我有用:
df.Date = pd.datetools.to_datetime(df.Date)
df.Date.head()
Out[972]:
0 2014-05-31
1 2014-05-31
2 2014-05-31
3 2014-05-31
4 2014-05-31
Name: Date, dtype: datetime64[ns]
对于问题的第二部分,您可以像这样切分数据框:
df[df.Time.map(lambda x: x.minute % 10 == 0)]