奇怪"重新索引错误"将Series转换为DataFrame

时间:2016-10-17 15:12:36

标签: python-3.x pandas dataframe

我有两个Series对象,从我的角度看它们看起来完全相同,除了它们包含不同的数据。我试图将它们转换为DataFrames并将它们作为单独的列放在同一个DataFrame中。出于某种原因,我无法理解,其中一个系列将被愉快地转换为DataFrame,而另一个系列在放入容器(列表或字典)时拒绝转换。我得到重建索引错误,但系列的索引中没有重复。

import pickle
import pandas as pd


s1 = pickle.load(open('s1.p', 'rb'))
s2 = pickle.load(open('s2.p', 'rb'))
print(s1.head(10))
print(s2.head(10))

pd.DataFrame(s1)  # <--- works fine
pd.DataFrame(s2)  # <--- works fine
pd.DataFrame([s1])  # <--- works fine
# pd.DataFrame([s2])  # <--- doesn't work
# pd.DataFrame([s1, s2])  # <--- doesn't work
pd.DataFrame({s1.name: s1})  # <--- works fine
pd.DataFrame({s2.name: s2})  # <--- works fine
pd.DataFrame({s1.name: s1, s2.name: s1})  # <--- works fine
# pd.DataFrame({s1.name: s1, s2.name: s2})  # <--- doesn't work

这是输出,虽然你在这里看不到,但索引值之间有重叠;它们只是处于不同的顺序。我希望在将它们组合到同一个DataFrame中时匹配索引。

id
801120    42.01
801138    50.18
801139    50.01
802101    53.77
802110    56.52
802112    47.37
802113    46.52
802114    46.58
802115    42.59
802117    40.85
Name: age, dtype: float64
id
A32067    0.39083
A32195    0.28506
A01685    0.36432
A11124    0.55649
A32020    0.41524
A32021    0.43788
A32098    0.49206
A00699    0.37515
A32158    0.58793
A14139    0.47413
Name: lh_vtx_000001, dtype: float64

取消注释最后一行时的回溯:

Traceback (most recent call last):
  File "/Users/sm2286/Documents/Vertex/test.py", line 18, in <module>
    pd.DataFrame({s1.name: s1, s2.name: s2})  # <--- doesn't work
  File "/Users/sm2286/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 224, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "/Users/sm2286/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 360, in _init_dict
    return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
  File "/Users/sm2286/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 5236, in _arrays_to_mgr
    arrays = _homogenize(arrays, index, dtype)
  File "/Users/sm2286/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 5534, in _homogenize
    v = v.reindex(index, copy=False)
  File "/Users/sm2286/anaconda3/lib/python3.5/site-packages/pandas/core/series.py", line 2287, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "/Users/sm2286/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 2229, in reindex
    fill_value, copy).__finalize__(self)
  File "/Users/sm2286/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 2247, in _reindex_axes
    copy=copy, allow_dups=False)
  File "/Users/sm2286/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 2341, in _reindex_with_indexers
    copy=copy)
  File "/Users/sm2286/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 3586, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)
  File "/Users/sm2286/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 2293, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

取消注释第13行时的回溯:

Traceback (most recent call last):
  File "/Users/sm2286/Documents/Vertex/test.py", line 13, in <module>
    pd.DataFrame([s2])  # <--- doesn't work
  File "/Users/sm2286/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 263, in __init__
    arrays, columns = _to_arrays(data, columns, dtype=dtype)
  File "/Users/sm2286/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 5359, in _to_arrays
    dtype=dtype)
  File "/Users/sm2286/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 5453, in _list_of_series_to_arrays
    indexer = indexer_cache[id(index)] = index.get_indexer(columns)
  File "/Users/sm2286/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 2082, in get_indexer
    raise InvalidIndexError('Reindexing only valid with uniquely'
pandas.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

1 个答案:

答案 0 :(得分:0)

经过更多调查后,系列之间的区别在于后者包含缺失值。删除它们解决了这个问题。