如何对索引数据框进行排序

时间:2016-11-22 02:46:43

标签: python sorting pandas dataframe categorical-data

sort pandas dataframe based on list不同,我有一个像这样的索引数据框:

$ echo -e 'abc\txyz\t0.9\nefg\txyz\t0.3\nlmn\topq\t0.23\nabc\tjkl\t0.5\n' > test.txt
$ cat test.txt
abc xyz 0.9
efg xyz 0.3
lmn opq 0.23
abc jkl 0.5
$ python

>>> import pandas as pd
>>> df = pd.read_csv('test.txt', delimiter='\t', header=None, dtype={0:unicode, 1:unicode, 2:float})
>>> df = df.pivot(index=0, columns=1, values=2)
>>> df = df.fillna(0)
>>> df
1    jkl   opq  xyz
0                  
abc  0.5  0.00  0.9
efg  0.0  0.00  0.3
lmn  0.0  0.23  0.0

在这种情况下我无法确定如何使用Categorical

# Desired row order.
>>> row_order = ['efg', 'abc', 'lmn']
# Desired column roder.
>>> col_order = ['xyz', 'jkl', 'opq']
>>> pd.Categorical(df[0], categories=row_order, ordered=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2059, in __getitem__
    return self._getitem_column(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2066, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1386, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3541, in get
    loc = self.items.get_loc(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.py", line 2136, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/index.pyx", line 139, in pandas.index.IndexEngine.get_loc (pandas/index.c:4443)
  File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4289)
  File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13733)
  File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13687)
KeyError: 0

我需要按以下顺序实现索引数据框:

1    xyz  jkl   opq  
0                  
efg  0.3  0.00  0.0
abc  0.9  0.50  0.0
lmn  0.0  0.00  0.23

1 个答案:

答案 0 :(得分:1)

df.reindex可以对行和列重新排序:

In [261]: df.reindex(index=row_order, columns=col_order)
Out[261]: 
1    xyz  jkl   opq
0                  
efg  0.3  0.0  0.00
abc  0.9  0.5  0.00
lmn  0.0  0.0  0.23