我有一个pandas.DataFrame
对象,由日期时间索引,通过pandas.read_csv
获得。数据的频率是10分钟。
我想从2014-06-15 00:00:00
到2014-07-01 00:00:00
选择一段时间。当我这样说时
a=df["2014-06-15 00:00:00":"2014-07-01 00:00:00"]
数据实际上从2014-06-15 00:10:00
开始,而不是2014-06-15 00:00:00
。但是,如果我写的话
a=df["2014-06-15 00:00":"2014-07-01 00:00"]
(“省略”秒),然后我得到预期的行为,即从2014-06-15 00:00:00
开始的数据。我错过了什么吗?我使用的是pandas版本0.16.0。
修改
MWE数据:
a,b,c
2014-06-14 23:10, 3.809, 103.0
2014-06-14 23:20, 2.935, 83.0
2014-06-14 23:30, 1.923, 73.0
2014-06-14 23:40, 2.843, 89.0
2014-06-14 23:50, 1.785, 125.0
2014-06-15 00:00, 2.383, 114.0
2014-06-15 00:10, 3.717, 94.0
2014-06-15 00:20, 5.005, 91.0
2014-06-15 00:30, 3.901, 97.0
2014-06-15 00:40, 3.395, 98.0
2014-06-15 00:50, 1.095, 36.0
2014-06-15 01:00, 2.383, 67.0
2014-06-15 01:10, 2.199, 98.0
2014-06-15 01:20, 3.533, 82.0
2014-06-15 01:30, 1.969, 81.0
2014-06-15 01:40, 2.705, 78.0
2014-06-15 01:50, 3.579, 52.0
2014-06-15 02:00, 2.613, 81.0
2014-06-15 02:10, 3.671, 71.0
2014-06-15 02:20, 4.591, 94.0
2014-06-15 02:30, 4.499, 84.0
2014-06-15 02:40, 2.383, 26.0
2014-06-15 02:50, 1.555, 86.0
2014-06-15 03:00, 2.061, 179.0
2014-06-15 03:10, 1.693, 299.0
2014-06-15 03:20, 2.705, 114.0
2014-06-15 03:30, 1.647, 104.0
2014-06-15 03:40, 3.027, 306.0
MWE代码:
import pandas as pd
df=pd.read_csv("mwe.csv", index_col=0)
a=df["2014-06-15 00:00:00":]
print a
PS:我刚刚发现此代码在pandas 0.14下无效。
答案 0 :(得分:1)
当像这样解析csv时(不指定parse_dates
参数):
df = pd.read_csv("mwe.csv", index_col=0)
没有尝试将字符串解析为日期。因此Index
有dtype object
,索引中的值是字符串。
In [45]: df.index
Out[45]: Index([u'2014-06-14 23:10', u'2014-06-14 23:20', u'2014-06-14 23:30', u'2014-06-14 23:40', u'2014-06-14 23:50', u'2014-06-15 00:00', u'2014-06-15 00:10', u'2014-06-15 00:20', u'2014-06-15 00:30', u'2014-06-15 00:40', u'2014-06-15 00:50', u'2014-06-15 01:00', u'2014-06-15 01:10', u'2014-06-15 01:20', u'2014-06-15 01:30', u'2014-06-15 01:40', u'2014-06-15 01:50', u'2014-06-15 02:00', u'2014-06-15 02:10', u'2014-06-15 02:20', u'2014-06-15 02:30', u'2014-06-15 02:40', u'2014-06-15 02:50', u'2014-06-15 03:00', u'2014-06-15 03:10', u'2014-06-15 03:20', u'2014-06-15 03:30', u'2014-06-15 03:40'], dtype='object')
字符"2014-06-15 00:00:00"
适用于u'2014-06-15 00:00'
和u'2014-06-15 00:10'
,因为strings are ordered lexicographically和u < v
if u
is a prefix of v
:
In [49]: u'2014-06-15 00:00' < u"2014-06-15 00:00:00" < u'2014-06-15 00:10'
Out[49]: True
(在内部,字符串在进行比较之前转换为unicode。)
解决此问题的方法是将类似日期的字符串解析为实际日期:
df = pd.read_csv("mwe.csv", index_col=0)
df.index = pd.DatetimeIndex(df.index)
或
df = pd.read_csv("mwe.csv", index_col=0, parse_dates=[0])
然后df["2014-06-15 00:00:00":]
和df["2014-06-15 00:00":]
都会返回预期结果:
In [57]: df["2014-06-15 00:00:00":].index[0]
Out[57]: Timestamp('2014-06-15 00:00:00')
In [58]: df["2014-06-15 00:00":].index[0]
Out[58]: Timestamp('2014-06-15 00:00:00')