排序pandas数据帧indeces

时间:2017-08-09 15:00:09

标签: python pandas indexing

我有一个数据框df,其中的indeces是

df.index
Out[4]: 
Index([u'2015-03-28_p001_2', u'2015-03-29_p001_2',
       u'2015-03-30_p001_2', u'2015-03-31_p001_2',
       u'2015-03-31_p002_3', u'2015-04-01_p001_2',
       u'2015-04-01_p002_3', u'2015-04-02_p001_2',
       u'2015-04-02_p002_3', u'2015-04-03_p001_2',
       ...
       u'2016-03-31_p127_1', u'2016-04-01_p127_1',
       u'2016-04-01_p128_3', u'2016-04-02_p127_1',
       u'2016-04-02_p128_3', u'2016-04-03_p127_1',
       u'2016-04-03_p128_3', u'2016-04-04_p127_1',
       u'2016-04-05_p127_1', u'2016-04-06_p127_1'],
      dtype='object', length=781)

数据帧df是两个数据帧合并的结果。

正如你从indeces中看到的那样,没有排序。例如。 '2015-03-31_p002_3'(第5位)在'2015-04-01_p001_2'之前(第6位)

我想将所有_p001_2组合在一起并根据日期排序,然后将所有_p002_3等分类。

但我没设法做到这一点......

1 个答案:

答案 0 :(得分:0)

如果无法使用sort_index,则有点复杂 - 需要split创建助手DataFrame,然后sort_values和最后reindex

idx = pd.Index([u'2015-03-28_p001_2', u'2015-03-29_p001_2',
       u'2015-03-30_p001_2', u'2015-03-31_p001_2',
       u'2015-03-31_p002_3', u'2015-04-01_p001_2',
       u'2015-04-01_p002_3', u'2015-04-02_p001_2',
       u'2015-04-02_p002_3', u'2015-04-03_p001_2',

       u'2016-03-31_p127_1', u'2016-04-01_p127_1',
       u'2016-04-01_p128_3', u'2016-04-02_p127_1',
       u'2016-04-02_p128_3', u'2016-04-03_p127_1',
       u'2016-04-03_p128_3', u'2016-04-04_p127_1',
       u'2016-04-05_p127_1', u'2016-04-06_p127_1'])

df = pd.DataFrame({'a':range(len(idx))}, index=idx)
print (df)
                    a
2015-03-28_p001_2   0
2015-03-29_p001_2   1
2015-03-30_p001_2   2
2015-03-31_p001_2   3
2015-03-31_p002_3   4
2015-04-01_p001_2   5
2015-04-01_p002_3   6
2015-04-02_p001_2   7
2015-04-02_p002_3   8
2015-04-03_p001_2   9
2016-03-31_p127_1  10
2016-04-01_p127_1  11
2016-04-01_p128_3  12
2016-04-02_p127_1  13
2016-04-02_p128_3  14
2016-04-03_p127_1  15
2016-04-03_p128_3  16
2016-04-04_p127_1  17
2016-04-05_p127_1  18
2016-04-06_p127_1  19
df = df.sort_index()
print (df)
                    a
2015-03-28_p001_2   0
2015-03-29_p001_2   1
2015-03-30_p001_2   2
2015-03-31_p001_2   3
2015-03-31_p002_3   4
2015-04-01_p001_2   5
2015-04-01_p002_3   6
2015-04-02_p001_2   7
2015-04-02_p002_3   8
2015-04-03_p001_2   9
2016-03-31_p127_1  10
2016-04-01_p127_1  11
2016-04-01_p128_3  12
2016-04-02_p127_1  13
2016-04-02_p128_3  14
2016-04-03_p127_1  15
2016-04-03_p128_3  16
2016-04-04_p127_1  17
2016-04-05_p127_1  18
2016-04-06_p127_1  19
df1 = df.index.to_series().str.split('_', expand=True)
df1[0] = pd.to_datetime(df1[0])
#if necessary change order columns for sorting 
df1 = df1.sort_values(by=[1,2,0])
print (df1)
                           0     1  2
2015-03-28_p001_2 2015-03-28  p001  2
2015-03-29_p001_2 2015-03-29  p001  2
2015-03-30_p001_2 2015-03-30  p001  2
2015-03-31_p001_2 2015-03-31  p001  2
2015-04-01_p001_2 2015-04-01  p001  2
2015-04-02_p001_2 2015-04-02  p001  2
2015-04-03_p001_2 2015-04-03  p001  2
2015-03-31_p002_3 2015-03-31  p002  3
2015-04-01_p002_3 2015-04-01  p002  3
2015-04-02_p002_3 2015-04-02  p002  3
2016-03-31_p127_1 2016-03-31  p127  1
2016-04-01_p127_1 2016-04-01  p127  1
2016-04-02_p127_1 2016-04-02  p127  1
2016-04-03_p127_1 2016-04-03  p127  1
2016-04-04_p127_1 2016-04-04  p127  1
2016-04-05_p127_1 2016-04-05  p127  1
2016-04-06_p127_1 2016-04-06  p127  1
2016-04-01_p128_3 2016-04-01  p128  3
2016-04-02_p128_3 2016-04-02  p128  3
2016-04-03_p128_3 2016-04-03  p128  3
df = df.reindex(df1.index)
print (df)
                    a
2015-03-28_p001_2   0
2015-03-29_p001_2   1
2015-03-30_p001_2   2
2015-03-31_p001_2   3
2015-04-01_p001_2   5
2015-04-02_p001_2   7
2015-04-03_p001_2   9
2015-03-31_p002_3   4
2015-04-01_p002_3   6
2015-04-02_p002_3   8
2016-03-31_p127_1  10
2016-04-01_p127_1  11
2016-04-02_p127_1  13
2016-04-03_p127_1  15
2016-04-04_p127_1  17
2016-04-05_p127_1  18
2016-04-06_p127_1  19
2016-04-01_p128_3  12
2016-04-02_p128_3  14
2016-04-03_p128_3  16

编辑:

如果重复,则需要创建新列,排序并最后删除它们:

df[[0,1,2]] = df.index.to_series().str.split('_', expand=True)
df[0] = pd.to_datetime(df[0])
df = df.sort_values(by=[1,2,0])
df = df.drop([0,1,2], axis=1)
print (df)
                    a
2015-03-28_p001_2   0
2015-03-29_p001_2   1
2015-03-30_p001_2   2
2015-03-31_p001_2   3
2015-04-01_p001_2   5
2015-04-02_p001_2   7
2015-04-03_p001_2   9
2015-03-31_p002_3   4
2015-04-01_p002_3   6
2015-04-02_p002_3   8
2016-03-31_p127_1  10
2016-04-01_p127_1  11
2016-04-02_p127_1  13
2016-04-03_p127_1  15
2016-04-04_p127_1  17
2016-04-05_p127_1  18
2016-04-06_p127_1  19
2016-04-01_p128_3  12
2016-04-02_p128_3  14
2016-04-03_p128_3  16