我在发送数据框to_Panel
时遇到问题。我正在对数据进行预先操作,并担心这些可能会导致问题。
merge.head()
date catcode type di cid feccandid amount disposition bills
0 2005-12-31 G1100 24K D N00004045 H2MI11042 1500 support 1
1 2005-12-31 L1100 24K D N00004045 H2MI11042 8000 support 1
2 2005-12-31 L1100 24K D N00004155 H2MI02066 1000 oppose 1
3 2005-12-31 T1200 24K D N00004166 H4MI03045 3000 support 1
然后我形成一个pivot_table
mm = merge.pivot_table(index=['date', 'feccandid', 'disposition', \
'bills', 'cid', 'di', 'type'], columns='catcode',values='amount', \
fill_value=0)
catcode A0000 A1000 A1100 A1200 A1300 A1400 A1500 A1600 A2000 A2300 ... T9100 T9400 X3700 X4000 X4100 X4110 X5000 X7000 Y0000 Z5200
date feccandid disposition bills cid di type
2005-12-31 H2MI02066 oppose 1 N00004155 D 24K 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
H2MI11042 support 1 N00004045 D 24K 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
H4MI03045 support 1 N00004166 D 24K 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 rows × 315 columns
然后我重置索引:
mm = mm.reset_index()
mm.head()
catcode date feccandid disposition bills cid di type A0000 A1000 A1100 ... T9100 T9400 X3700 X4000 X4100 X4110 X5000 X7000 Y0000 Z5200
0 2005-12-31 H2MI02066 oppose 1 N00004155 D 24K 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 2005-12-31 H2MI11042 support 1 N00004045 D 24K 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 2005-12-31 H4MI03045 support 1 N00004166 D 24K 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
然后我发送到csv:
mm.to_csv('i.test', index=False)
从csv阅读:
hh = pd.read_csv('i.test')
设置索引:
hh.set_index(['date', 'feccandid']).head(3)
disposition bills cid di type A0000 A1000 A1100 A1200 A1300 ... T9100 T9400 X3700 X4000 X4100 X4110 X5000 X7000 Y0000 Z5200
date feccandid
2005-12-31 H2MI02066 oppose 1 N00004155 D 24K 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
H2MI11042 support 1 N00004045 D 24K 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
H4MI03045 support 1 N00004166 D 24K 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
面板:
hh.to_panel()
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-86-9358192e71a3> in <module>()
----> 1 hh.to_panel()
/home/jayaramdas/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in to_panel(self)
1210 if (not isinstance(self.index, MultiIndex) or # pragma: no cover
1211 len(self.index.levels) != 2):
-> 1212 raise NotImplementedError('Only 2-level MultiIndex are supported.')
1213
1214 if not self.index.is_unique:
NotImplementedError: Only 2-level MultiIndex are supported.
任何想法,问题或批评?
答案 0 :(得分:2)
set_index
不会发生,因此您的hh
没有MultiIndex作为索引。
>>> hh.to_panel()
Traceback (most recent call last):
File "<ipython-input-4-9358192e71a3>", line 1, in <module>
hh.to_panel()
File "/home/dsm/sys/pys/3.5.1/lib/python3.5/site-packages/pandas/core/frame.py", line 1224, in to_panel
raise NotImplementedError('Only 2-level MultiIndex are supported.')
NotImplementedError: Only 2-level MultiIndex are supported.
>>> hh.set_index(["date", "feccandid"]).to_panel()
<class 'pandas.core.panel.Panel'>
Dimensions: 20 (items) x 1 (major_axis) x 3 (minor_axis)
Items axis: catcode to Z5200
Major_axis axis: 2005-12-31 to 2005-12-31
Minor_axis axis: H2MI02066 to H4MI03045
您可以将inplace=True
添加到set_index
,但只需更新hh = hh.set_index(...)
就可以了。{/ p>
除此之外:我认为Panel正逐渐被弃用,以支持更丰富的xarray
N-d对象,因此您可能需要考虑安装xarray
然后再进行
>>> hh.to_xarray()
<xarray.Dataset>
Dimensions: (date: 1, feccandid: 3)
Coordinates:
* date (date) object '2005-12-31'
* feccandid (feccandid) object 'H2MI02066' 'H2MI11042' 'H4MI03045'
Data variables:
catcode (date, feccandid) int64 0 1 2
disposition (date, feccandid) object 'oppose' 'support' 'support'
bills (date, feccandid) int64 1 1 1
cid (date, feccandid) object 'N00004155' 'N00004045' 'N00004166'
di (date, feccandid) object 'D' 'D' 'D'
type (date, feccandid) object '24K' '24K' '24K'
[...]
而是以这种方式进行试验。