串行化Dataframe内联的最佳方法是什么?

时间:2016-07-10 19:48:13

标签: python pandas

我正在尝试创建一段可执行代码,其中包含嵌入的以下DataFrame:

Contract  Date      
201507    2014-06-18    1462.6
          2014-07-03    1518.6
          2014-09-05      10.2
201510    2015-09-14     977.9
201607    2016-05-17    1062.0

我希望能够序列化我现有的数据帧并将其粘贴到代码中,以便我可以在StackOverflow上的另一个问题上共享一个可立即执行的示例,而无需导出到CSV等。

如何?

修改

to_dict()的输出,缺少索引:

{(201510, Timestamp('2015-07-21 00:00:00')): 987.90000000000009,
 (201510, Timestamp('2015-08-10 00:00:00')): 973.60000000000014,
 (201604, Timestamp('2016-01-08 00:00:00')): 890.5,
 (201604, Timestamp('2016-01-19 00:00:00')): 837.20000000000005,
 (201607, Timestamp('2016-03-29 00:00:00')): 955.80000000000007}

1 个答案:

答案 0 :(得分:2)

也许.to_dict方法可能能够满足您的需求?

In [22]: df
Out[22]: 
                     0         1         2         3
first second                                        
bar   one     0.857213  2.541895  0.632027 -0.723664
      two     0.670757  0.131845  0.443510 -0.215069
baz   one     0.244309  0.355917  1.369525  0.016433
      two     0.306323  1.997372 -0.034486 -0.632124
foo   one     1.899891  0.978404 -1.326377 -0.379395
      two    -0.258645  1.334551 -0.002280 -0.570494
qux   one     0.956760  1.516873  0.145715  0.548522
      two    -0.935483 -0.613533 -0.259667  1.678930

In [23]: df_dict = df.to_dict()

In [24]: df_dict
Out[24]: 
{0: {('bar', 'one'): 0.8572134743227553,
  ('bar', 'two'): 0.67075702403871984,
  ('baz', 'one'): 0.24430909274954596,
  ('baz', 'two'): 0.30632263405892973,
  ('foo', 'one'): 1.8998914080547422,
  ('foo', 'two'): -0.25864498582941658,
  ('qux', 'one'): 0.95676035178925078,
  ('qux', 'two'): -0.93548268578556593},
 1: {('bar', 'one'): 2.5418951943252983,
  ('bar', 'two'): 0.13184487691403465,
  ('baz', 'one'): 0.35591677598165794,
  ('baz', 'two'): 1.9973715806631951,
  ('foo', 'one'): 0.97840399034039371,
  ('foo', 'two'): 1.334550971309663,
  ('qux', 'one'): 1.5168730423092398,
  ('qux', 'two'): -0.61353256979962567},
 2: {('bar', 'one'): 0.63202740995444018,
  ('bar', 'two'): 0.44350955006551607,
  ('baz', 'one'): 1.3695250782939834,
  ('baz', 'two'): -0.034485597227602881,
  ('foo', 'one'): -1.32637743164928,
  ('foo', 'two'): -0.0022801431751758058,
  ('qux', 'one'): 0.14571459315814703,
  ('qux', 'two'): -0.25966683560443388},
 3: {('bar', 'one'): -0.72366363290625402,
  ('bar', 'two'): -0.21506930103507182,
  ('baz', 'one'): 0.016432503332560005,
  ('baz', 'two'): -0.63212432354247639,
  ('foo', 'one'): -0.37939466798831689,
  ('foo', 'two'): -0.57049399142274893,
  ('qux', 'one'): 0.54852179259808065,
  ('qux', 'two'): 1.6789299753495908}}

In [25]: pd.DataFrame(df_dict)
Out[25]: 
                0         1         2         3
bar one  0.857213  2.541895  0.632027 -0.723664
    two  0.670757  0.131845  0.443510 -0.215069
baz one  0.244309  0.355917  1.369525  0.016433
    two  0.306323  1.997372 -0.034486 -0.632124
foo one  1.899891  0.978404 -1.326377 -0.379395
    two -0.258645  1.334551 -0.002280 -0.570494
qux one  0.956760  1.516873  0.145715  0.548522
    two -0.935483 -0.613533 -0.259667  1.678930

In [26]: 

您可以将字典输出复制并粘贴到pd.DataFrame构造函数中。如果您使用from pandas import Timestamp

,这甚至可以与datetime对象一起使用
In [37]: from pandas import Timestamp

In [38]: df2.to_dict()
Out[38]: 
{0: {0: Timestamp('2011-01-01 05:00:00'),
  1: Timestamp('2011-01-01 06:00:00'),
  2: Timestamp('2011-01-01 07:00:00'),
  3: Timestamp('2011-01-01 08:00:00'),
  4: Timestamp('2011-01-01 09:00:00')}}

In [39]: {0: {0: Timestamp('2011-01-01 05:00:00'),
   ....:   1: Timestamp('2011-01-01 06:00:00'),
   ....:   2: Timestamp('2011-01-01 07:00:00'),
   ....:   3: Timestamp('2011-01-01 08:00:00'),
   ....:   4: Timestamp('2011-01-01 09:00:00')}}
Out[39]: 
{0: {0: Timestamp('2011-01-01 05:00:00'),
  1: Timestamp('2011-01-01 06:00:00'),
  2: Timestamp('2011-01-01 07:00:00'),
  3: Timestamp('2011-01-01 08:00:00'),
  4: Timestamp('2011-01-01 09:00:00')}}

In [40]: pd.DataFrame({0: {0: Timestamp('2011-01-01 05:00:00'),
   ....:   1: Timestamp('2011-01-01 06:00:00'),
   ....:   2: Timestamp('2011-01-01 07:00:00'),
   ....:   3: Timestamp('2011-01-01 08:00:00'),
   ....:   4: Timestamp('2011-01-01 09:00:00')}})
Out[40]: 
                    0
0 2011-01-01 05:00:00
1 2011-01-01 06:00:00
2 2011-01-01 07:00:00
3 2011-01-01 08:00:00
4 2011-01-01 09:00:00

编辑

我很确定您遇到的问题是您使用的是系列,可能是使用列切片的结果,例如: df["colname"]看看我如何反序化你的词典:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: from pandas import Timestamp

In [4]: d = {(201510, Timestamp('2015-07-21 00:00:00')): 987.90000000000009,
   ...:  (201510, Timestamp('2015-08-10 00:00:00')): 973.60000000000014,
   ...:  (201604, Timestamp('2016-01-08 00:00:00')): 890.5,
   ...:  (201604, Timestamp('2016-01-19 00:00:00')): 837.20000000000005,
   ...:  (201607, Timestamp('2016-03-29 00:00:00')): 955.80000000000007}

In [5]: pd.DataFrame(d)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-62c9f5619d37> in <module>()
----> 1 pd.DataFrame(d)

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    221                                  dtype=dtype, copy=copy)
    222         elif isinstance(data, dict):
--> 223             mgr = self._init_dict(data, index, columns, dtype=dtype)
    224         elif isinstance(data, ma.MaskedArray):
    225             import numpy.ma.mrecords as mrecords

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
    357             arrays = [data[k] for k in keys]
    358 
--> 359         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    360 
    361     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5238     # figure out the index, if necessary
   5239     if index is None:
-> 5240         index = extract_index(arrays)
   5241     else:
   5242         index = _ensure_index(index)

/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in extract_index(data)
   5277 
   5278         if not indexes and not raw_lengths:
-> 5279             raise ValueError('If using all scalar values, you must pass'
   5280                              ' an index')
   5281 

ValueError: If using all scalar values, you must pass an index

In [6]: S = pd.Series(d)

In [7]: S
Out[7]: 
201510  2015-07-21    987.9
        2015-08-10    973.6
201604  2016-01-08    890.5
        2016-01-19    837.2
201607  2016-03-29    955.8
dtype: float64

In [8]: df = pd.DataFrame(S)

In [9]: df
Out[9]: 
                       0
201510 2015-07-21  987.9
       2015-08-10  973.6
201604 2016-01-08  890.5
       2016-01-19  837.2
201607 2016-03-29  955.8