我有两个数据框,c
和h
以下
c pickle file: http://s000.tinyupload.com/?file_id=64255815375060941529
h pickle file: http://s000.tinyupload.com/?file_id=98284988001290720556
当我写c.append(h)
时,我得到TypeError: data type not understood
但是只有我运行pandas 0.17.1。如果我在pandas 0.14.1中运行此代码,则会正确附加数据帧。发生了什么以及如何修改我的数据帧以便在0.17.1中正确追加?
编辑:这是数据帧的头
In [49]: h.head(3)
Out[49]:
report_id adv_firm_key manager_id filing_manager_name \
0 45497 105129 20984 Bridgewater Associates, LP
1 45497 105129 20984 Bridgewater Associates, LP
2 45497 105129 20984 Bridgewater Associates, LP
report_period issuer_name cusip position_value quantity \
0 2015-12-31 ABBOTT LABS 002824100 1745000 38857
1 2015-12-31 ACCENTURE PLC IRELAND G1151C101 512000 4900
2 2015-12-31 ADOBE SYS INC 00724F101 9157000 97479
principal_type put_or_call sector total_holding_value \
0 SH X Health Care 7707722000
1 SH X Information Technology 7707722000
2 SH X Information Technology 7707722000
total_holding_value_calculated market_cap shares_float beta symbol \
0 7707722000 66993140300 1488070000 0.924138 ABT
1 7707722000 67773564900 626355000 0.985543 ACN
2 7707722000 46848347700 496787000 1.099186 ADBE
allocation portfolio_value
0 300000 2000000
1 300000 2000000
2 300000 2000000
In [50]: c.head(3)
Out[50]:
put_or_call position_value report_date fund_id report_period \
0 X 10000 2015-11-02 502 2015-12-31
1 X 10000 2015-11-02 502 2015-12-31
2 X 10000 2015-11-02 502 2015-12-31
underlying_id quantity side created_at report_id \
0 1001 5 Short 2016-03-16 17:31:57.003792+00:00 NaN
1 1001 5 Short 2016-03-16 17:31:57.003792+00:00 NaN
2 1001 5 Short 2016-03-16 17:31:57.003792+00:00 NaN
... adv_firm_key filing_manager_name symbol \
0 ... 155680 Davidson Kempner Capital Management LP AAOI
1 ... 155680 Davidson Kempner Capital Management LP AAOI
2 ... 155680 Davidson Kempner Capital Management LP AAOI
sector cusip issuer_name \
0 Telecommunication Services 03823U102 APPLIED OPTOELECTRONICS INC
1 Telecommunication Services 03823U102 APPLIED OPTOELECTRONICSINC COM
2 Telecommunication Services 03823U102 APPLIED OPTOELECTRONICS INC
principal_type market_cap shares_float beta
0 SH 288734200 14566500 1.45758
1 SH 288734200 14566500 1.45758
2 SH 288734200 14566500 1.45758
[3 rows x 21 columns]
编辑2:这是一个堆栈跟踪
In [11]: pd.concat([c,h])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-11-943f474750e7> in <module>()
----> 1 pd.concat([c,h])
/usr/local/miniconda/envs/analytics-env/lib/python2.7/site-packages/pandas/tools/merge.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
833 verify_integrity=verify_integrity,
834 copy=copy)
--> 835 return op.get_result()
836
837
/usr/local/miniconda/envs/analytics-env/lib/python2.7/site-packages/pandas/tools/merge.py in get_result(self)
1023 new_data = concatenate_block_managers(
1024 mgrs_indexers, self.new_axes,
-> 1025 concat_axis=self.axis, copy=self.copy)
1026 if not self.copy:
1027 new_data._consolidate_inplace()
/usr/local/miniconda/envs/analytics-env/lib/python2.7/site-packages/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
4472 copy=copy),
4473 placement=placement)
-> 4474 for placement, join_units in concat_plan]
4475
4476 return BlockManager(blocks, axes)
/usr/local/miniconda/envs/analytics-env/lib/python2.7/site-packages/pandas/core/internals.py in concatenate_join_units(join_units, concat_axis, copy)
4569 to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
4570 upcasted_na=upcasted_na)
-> 4571 for ju in join_units]
4572
4573 if len(to_concat) == 1:
/usr/local/miniconda/envs/analytics-env/lib/python2.7/site-packages/pandas/core/internals.py in get_reindexed_values(self, empty_dtype, upcasted_na)
4823 if self.is_null and not getattr(self.block, 'is_categorical',
4824 None):
-> 4825 missing_arr = np.empty(self.shape, dtype=empty_dtype)
4826 if np.prod(self.shape):
4827 # NumPy 1.6 workaround: this statement gets strange if all
TypeError: data type not understood
答案 0 :(得分:1)
有错误11351 - 处理不当:
如果您尝试添加created_at
和concat
中缺少的新列h
:
h['created_at'] = np.nan
new = pd.concat([h,c])
得到错误:
AttributeError:&#39; numpy.ndarray&#39;对象没有属性&#39; tz_localize&#39;
一种解决方案是将Datetime
转换为string
:
c['created_at'] = c['created_at'].astype(str)
new = pd.concat([h,c])
new['created_at'] = pd.to_datetime(new['created_at'])