Pandas multi-indexing succeeds when given a tuple, fails with a list

时间:2016-07-11 21:24:37

标签: python pandas numpy dataframe

I have data in the form of an array of lists, of the form [['Manhattan', 142, 42], [...]]. I have a pd.DataFrame with a multi-index, one containing, among other things, a column called VAC.

The following raises a ValueError:

for vac_bbl in vac_bbls:
    property_profiles['VAC'][vac_bbl] = None

The traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-98-e8edfc85d7ba> in <module>()
      1 for vac_bbl in vac_bbls:
----> 2     property_profiles['VAC'][vac_bbl] = None

C:\Anaconda3\envs\test\lib\site-packages\pandas\core\series.py in __setitem__(self, key, value)
    751         # do the setitem
    752         cacher_needs_updating = self._check_is_chained_assignment_possible()
--> 753         setitem(key, value)
    754         if cacher_needs_updating:
    755             self._maybe_update_cacher()

C:\Anaconda3\envs\test\lib\site-packages\pandas\core\series.py in setitem(key, value)
    747                     pass
    748 
--> 749             self._set_with(key, value)
    750 
    751         # do the setitem

C:\Anaconda3\envs\test\lib\site-packages\pandas\core\series.py in _set_with(self, key, value)
    795                 self._set_values(key.astype(np.bool_), value)
    796             else:
--> 797                 self._set_labels(key, value)
    798 
    799     def _set_labels(self, key, value):

C:\Anaconda3\envs\test\lib\site-packages\pandas\core\series.py in _set_labels(self, key, value)
    805         mask = indexer == -1
    806         if mask.any():
--> 807             raise ValueError('%s not contained in the index' % str(key[mask]))
    808         self._set_values(indexer, value)
    809 

ValueError: ['Manhattan' 1750.0 53.0] not contained in the index

However, the following works fine:

for vac_bbl in vac_bbls:
    property_profiles['VAC'][tuple(vac_bbl)] = None

Why is this?

1 个答案:

答案 0 :(得分:4)

pandas uses a list in this context to return a dataframe of columns in which each column is indexed with each specific item in the list. A tuple is used to represent the multiple layers for that particular column in the multiindex. This makes perfect sense and is as expected.

You can also pass a list of tuples that will return a dataframe of columns, one column for each tuple.