我正在阅读MongoDB集合中的数据
mongo_url = 'mongodb://localhost:27017/db'
client = pymongo.MongoClient(mongo_url)
db = client.db
collection = db.coll
docs = list(collection.find({}, {"Date": 1, "Cost" : 1, "_id" : 0 }).sort("date", pymongo.ASCENDING))
所以我最终得到了一份存储在docs中的dicts列表,其格式为
[{u'Date': u'2008-01-01', u'Cost': 8557.0}, {u'Date': u'2008-01-02', u'Cost': 62307.0},.....]
然后,我可以从此
创建DataFrame
frame = DataFrame(docs)
格式为
但我想将 Date 列用作DatetimeIndex
。我一直以非常黑客的方式做这件事,但我知道必须有一个更清洁的方式来做到这一点。
frame = frame.set_index(pd.to_datetime(frame['Date']))
此外,如果我检查索引,我发现freq
不存在,所以我想在创建DataFrame
时尝试设置每日频率
如果我试试这个
frame = DataFrame(docs)
frame.set_index('Date', inplace=True)
frame.index = pd.DatetimeIndex(frame.index, freq='D')
由于某种原因我收到以下错误
ValueError:推断频率无传递日期不符合 通过频率D
但另一个建议对我很有用。
idx = pd.DatetimeIndex([x['Date'] for x in docs], freq='D')
frame = DataFrame(docs, index=idx)
frame = frame.drop('Date', 1)
答案 0 :(得分:2)
如果需要在Datetimindex
构造函数中创建DataFrame
:
docs = [{u'Date': u'2008-01-01', u'Cost': 8557.0},{u'Date': u'2008-01-02', u'Cost': 62307.0}]
idx = pd.DatetimeIndex([x['Date'] for x in docs], freq='D')
print (idx)
DatetimeIndex(['2008-01-01', '2008-01-02'], dtype='datetime64[ns]', freq='D')
frame = pd.DataFrame(docs, index=idx)
print (frame)
Cost Date
2008-01-01 8557.0 2008-01-01
2008-01-02 62307.0 2008-01-02
print (frame.index)
DatetimeIndex(['2008-01-01', '2008-01-02'], dtype='datetime64[ns]', freq='D')
另一种解决方案,如果在创建DatetimeIndex
之后创建DataFrame
:
您可以set_index
使用DatetimeIndex
:
docs = [{u'Date': u'2008-01-01', u'Cost': 8557.0},{u'Date': u'2008-01-02', u'Cost': 62307.0}]
frame = pd.DataFrame(docs)
print (frame)
Cost Date
0 8557.0 2008-01-01
1 62307.0 2008-01-02
frame.set_index('Date', inplace=True)
frame.index = pd.DatetimeIndex(frame.index, freq='D')
print (frame)
Cost
2008-01-01 8557.0
2008-01-02 62307.0
print (frame.index)
DatetimeIndex(['2008-01-01', '2008-01-02'], dtype='datetime64[ns]', freq='D')
如果需要将列Date
复制到index
:
docs = [{u'Date': u'2008-01-01', u'Cost': 8557.0},{u'Date': u'2008-01-02', u'Cost': 62307.0}]
frame = pd.DataFrame(docs)
print (frame)
Cost Date
0 8557.0 2008-01-01
1 62307.0 2008-01-02
frame.set_index(frame.Date, inplace=True)
frame.index = pd.DatetimeIndex(frame.index, freq='D')
print (frame)
Cost Date
2008-01-01 8557.0 2008-01-01
2008-01-02 62307.0 2008-01-02
print (frame.index)
DatetimeIndex(['2008-01-01', '2008-01-02'], dtype='datetime64[ns]', freq='D')