我正在尝试创建一段可执行代码,其中包含嵌入的以下DataFrame:
Contract Date
201507 2014-06-18 1462.6
2014-07-03 1518.6
2014-09-05 10.2
201510 2015-09-14 977.9
201607 2016-05-17 1062.0
我希望能够序列化我现有的数据帧并将其粘贴到代码中,以便我可以在StackOverflow上的另一个问题上共享一个可立即执行的示例,而无需导出到CSV等。
如何?
修改
to_dict()的输出,缺少索引:
{(201510, Timestamp('2015-07-21 00:00:00')): 987.90000000000009,
(201510, Timestamp('2015-08-10 00:00:00')): 973.60000000000014,
(201604, Timestamp('2016-01-08 00:00:00')): 890.5,
(201604, Timestamp('2016-01-19 00:00:00')): 837.20000000000005,
(201607, Timestamp('2016-03-29 00:00:00')): 955.80000000000007}
答案 0 :(得分:2)
也许.to_dict
方法可能能够满足您的需求?
In [22]: df
Out[22]:
0 1 2 3
first second
bar one 0.857213 2.541895 0.632027 -0.723664
two 0.670757 0.131845 0.443510 -0.215069
baz one 0.244309 0.355917 1.369525 0.016433
two 0.306323 1.997372 -0.034486 -0.632124
foo one 1.899891 0.978404 -1.326377 -0.379395
two -0.258645 1.334551 -0.002280 -0.570494
qux one 0.956760 1.516873 0.145715 0.548522
two -0.935483 -0.613533 -0.259667 1.678930
In [23]: df_dict = df.to_dict()
In [24]: df_dict
Out[24]:
{0: {('bar', 'one'): 0.8572134743227553,
('bar', 'two'): 0.67075702403871984,
('baz', 'one'): 0.24430909274954596,
('baz', 'two'): 0.30632263405892973,
('foo', 'one'): 1.8998914080547422,
('foo', 'two'): -0.25864498582941658,
('qux', 'one'): 0.95676035178925078,
('qux', 'two'): -0.93548268578556593},
1: {('bar', 'one'): 2.5418951943252983,
('bar', 'two'): 0.13184487691403465,
('baz', 'one'): 0.35591677598165794,
('baz', 'two'): 1.9973715806631951,
('foo', 'one'): 0.97840399034039371,
('foo', 'two'): 1.334550971309663,
('qux', 'one'): 1.5168730423092398,
('qux', 'two'): -0.61353256979962567},
2: {('bar', 'one'): 0.63202740995444018,
('bar', 'two'): 0.44350955006551607,
('baz', 'one'): 1.3695250782939834,
('baz', 'two'): -0.034485597227602881,
('foo', 'one'): -1.32637743164928,
('foo', 'two'): -0.0022801431751758058,
('qux', 'one'): 0.14571459315814703,
('qux', 'two'): -0.25966683560443388},
3: {('bar', 'one'): -0.72366363290625402,
('bar', 'two'): -0.21506930103507182,
('baz', 'one'): 0.016432503332560005,
('baz', 'two'): -0.63212432354247639,
('foo', 'one'): -0.37939466798831689,
('foo', 'two'): -0.57049399142274893,
('qux', 'one'): 0.54852179259808065,
('qux', 'two'): 1.6789299753495908}}
In [25]: pd.DataFrame(df_dict)
Out[25]:
0 1 2 3
bar one 0.857213 2.541895 0.632027 -0.723664
two 0.670757 0.131845 0.443510 -0.215069
baz one 0.244309 0.355917 1.369525 0.016433
two 0.306323 1.997372 -0.034486 -0.632124
foo one 1.899891 0.978404 -1.326377 -0.379395
two -0.258645 1.334551 -0.002280 -0.570494
qux one 0.956760 1.516873 0.145715 0.548522
two -0.935483 -0.613533 -0.259667 1.678930
In [26]:
您可以将字典输出复制并粘贴到pd.DataFrame
构造函数中。如果您使用from pandas import Timestamp
In [37]: from pandas import Timestamp
In [38]: df2.to_dict()
Out[38]:
{0: {0: Timestamp('2011-01-01 05:00:00'),
1: Timestamp('2011-01-01 06:00:00'),
2: Timestamp('2011-01-01 07:00:00'),
3: Timestamp('2011-01-01 08:00:00'),
4: Timestamp('2011-01-01 09:00:00')}}
In [39]: {0: {0: Timestamp('2011-01-01 05:00:00'),
....: 1: Timestamp('2011-01-01 06:00:00'),
....: 2: Timestamp('2011-01-01 07:00:00'),
....: 3: Timestamp('2011-01-01 08:00:00'),
....: 4: Timestamp('2011-01-01 09:00:00')}}
Out[39]:
{0: {0: Timestamp('2011-01-01 05:00:00'),
1: Timestamp('2011-01-01 06:00:00'),
2: Timestamp('2011-01-01 07:00:00'),
3: Timestamp('2011-01-01 08:00:00'),
4: Timestamp('2011-01-01 09:00:00')}}
In [40]: pd.DataFrame({0: {0: Timestamp('2011-01-01 05:00:00'),
....: 1: Timestamp('2011-01-01 06:00:00'),
....: 2: Timestamp('2011-01-01 07:00:00'),
....: 3: Timestamp('2011-01-01 08:00:00'),
....: 4: Timestamp('2011-01-01 09:00:00')}})
Out[40]:
0
0 2011-01-01 05:00:00
1 2011-01-01 06:00:00
2 2011-01-01 07:00:00
3 2011-01-01 08:00:00
4 2011-01-01 09:00:00
df["colname"]
看看我如何反序化你的词典:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: from pandas import Timestamp
In [4]: d = {(201510, Timestamp('2015-07-21 00:00:00')): 987.90000000000009,
...: (201510, Timestamp('2015-08-10 00:00:00')): 973.60000000000014,
...: (201604, Timestamp('2016-01-08 00:00:00')): 890.5,
...: (201604, Timestamp('2016-01-19 00:00:00')): 837.20000000000005,
...: (201607, Timestamp('2016-03-29 00:00:00')): 955.80000000000007}
In [5]: pd.DataFrame(d)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-62c9f5619d37> in <module>()
----> 1 pd.DataFrame(d)
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
221 dtype=dtype, copy=copy)
222 elif isinstance(data, dict):
--> 223 mgr = self._init_dict(data, index, columns, dtype=dtype)
224 elif isinstance(data, ma.MaskedArray):
225 import numpy.ma.mrecords as mrecords
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
357 arrays = [data[k] for k in keys]
358
--> 359 return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
360
361 def _init_ndarray(self, values, index, columns, dtype=None, copy=False):
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
5238 # figure out the index, if necessary
5239 if index is None:
-> 5240 index = extract_index(arrays)
5241 else:
5242 index = _ensure_index(index)
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in extract_index(data)
5277
5278 if not indexes and not raw_lengths:
-> 5279 raise ValueError('If using all scalar values, you must pass'
5280 ' an index')
5281
ValueError: If using all scalar values, you must pass an index
In [6]: S = pd.Series(d)
In [7]: S
Out[7]:
201510 2015-07-21 987.9
2015-08-10 973.6
201604 2016-01-08 890.5
2016-01-19 837.2
201607 2016-03-29 955.8
dtype: float64
In [8]: df = pd.DataFrame(S)
In [9]: df
Out[9]:
0
201510 2015-07-21 987.9
2015-08-10 973.6
201604 2016-01-08 890.5
2016-01-19 837.2
201607 2016-03-29 955.8