我有一个数据框df
,其中的indeces是
df.index
Out[4]:
Index([u'2015-03-28_p001_2', u'2015-03-29_p001_2',
u'2015-03-30_p001_2', u'2015-03-31_p001_2',
u'2015-03-31_p002_3', u'2015-04-01_p001_2',
u'2015-04-01_p002_3', u'2015-04-02_p001_2',
u'2015-04-02_p002_3', u'2015-04-03_p001_2',
...
u'2016-03-31_p127_1', u'2016-04-01_p127_1',
u'2016-04-01_p128_3', u'2016-04-02_p127_1',
u'2016-04-02_p128_3', u'2016-04-03_p127_1',
u'2016-04-03_p128_3', u'2016-04-04_p127_1',
u'2016-04-05_p127_1', u'2016-04-06_p127_1'],
dtype='object', length=781)
数据帧df是两个数据帧合并的结果。
正如你从indeces中看到的那样,没有排序。例如。 '2015-03-31_p002_3'
(第5位)在'2015-04-01_p001_2'
之前(第6位)
我想将所有_p001_2组合在一起并根据日期排序,然后将所有_p002_3等分类。
但我没设法做到这一点......
答案 0 :(得分:0)
如果无法使用sort_index
,则有点复杂 - 需要split
创建助手DataFrame
,然后sort_values
和最后reindex
:
idx = pd.Index([u'2015-03-28_p001_2', u'2015-03-29_p001_2',
u'2015-03-30_p001_2', u'2015-03-31_p001_2',
u'2015-03-31_p002_3', u'2015-04-01_p001_2',
u'2015-04-01_p002_3', u'2015-04-02_p001_2',
u'2015-04-02_p002_3', u'2015-04-03_p001_2',
u'2016-03-31_p127_1', u'2016-04-01_p127_1',
u'2016-04-01_p128_3', u'2016-04-02_p127_1',
u'2016-04-02_p128_3', u'2016-04-03_p127_1',
u'2016-04-03_p128_3', u'2016-04-04_p127_1',
u'2016-04-05_p127_1', u'2016-04-06_p127_1'])
df = pd.DataFrame({'a':range(len(idx))}, index=idx)
print (df)
a
2015-03-28_p001_2 0
2015-03-29_p001_2 1
2015-03-30_p001_2 2
2015-03-31_p001_2 3
2015-03-31_p002_3 4
2015-04-01_p001_2 5
2015-04-01_p002_3 6
2015-04-02_p001_2 7
2015-04-02_p002_3 8
2015-04-03_p001_2 9
2016-03-31_p127_1 10
2016-04-01_p127_1 11
2016-04-01_p128_3 12
2016-04-02_p127_1 13
2016-04-02_p128_3 14
2016-04-03_p127_1 15
2016-04-03_p128_3 16
2016-04-04_p127_1 17
2016-04-05_p127_1 18
2016-04-06_p127_1 19
df = df.sort_index()
print (df)
a
2015-03-28_p001_2 0
2015-03-29_p001_2 1
2015-03-30_p001_2 2
2015-03-31_p001_2 3
2015-03-31_p002_3 4
2015-04-01_p001_2 5
2015-04-01_p002_3 6
2015-04-02_p001_2 7
2015-04-02_p002_3 8
2015-04-03_p001_2 9
2016-03-31_p127_1 10
2016-04-01_p127_1 11
2016-04-01_p128_3 12
2016-04-02_p127_1 13
2016-04-02_p128_3 14
2016-04-03_p127_1 15
2016-04-03_p128_3 16
2016-04-04_p127_1 17
2016-04-05_p127_1 18
2016-04-06_p127_1 19
df1 = df.index.to_series().str.split('_', expand=True)
df1[0] = pd.to_datetime(df1[0])
#if necessary change order columns for sorting
df1 = df1.sort_values(by=[1,2,0])
print (df1)
0 1 2
2015-03-28_p001_2 2015-03-28 p001 2
2015-03-29_p001_2 2015-03-29 p001 2
2015-03-30_p001_2 2015-03-30 p001 2
2015-03-31_p001_2 2015-03-31 p001 2
2015-04-01_p001_2 2015-04-01 p001 2
2015-04-02_p001_2 2015-04-02 p001 2
2015-04-03_p001_2 2015-04-03 p001 2
2015-03-31_p002_3 2015-03-31 p002 3
2015-04-01_p002_3 2015-04-01 p002 3
2015-04-02_p002_3 2015-04-02 p002 3
2016-03-31_p127_1 2016-03-31 p127 1
2016-04-01_p127_1 2016-04-01 p127 1
2016-04-02_p127_1 2016-04-02 p127 1
2016-04-03_p127_1 2016-04-03 p127 1
2016-04-04_p127_1 2016-04-04 p127 1
2016-04-05_p127_1 2016-04-05 p127 1
2016-04-06_p127_1 2016-04-06 p127 1
2016-04-01_p128_3 2016-04-01 p128 3
2016-04-02_p128_3 2016-04-02 p128 3
2016-04-03_p128_3 2016-04-03 p128 3
df = df.reindex(df1.index)
print (df)
a
2015-03-28_p001_2 0
2015-03-29_p001_2 1
2015-03-30_p001_2 2
2015-03-31_p001_2 3
2015-04-01_p001_2 5
2015-04-02_p001_2 7
2015-04-03_p001_2 9
2015-03-31_p002_3 4
2015-04-01_p002_3 6
2015-04-02_p002_3 8
2016-03-31_p127_1 10
2016-04-01_p127_1 11
2016-04-02_p127_1 13
2016-04-03_p127_1 15
2016-04-04_p127_1 17
2016-04-05_p127_1 18
2016-04-06_p127_1 19
2016-04-01_p128_3 12
2016-04-02_p128_3 14
2016-04-03_p128_3 16
编辑:
如果重复,则需要创建新列,排序并最后删除它们:
df[[0,1,2]] = df.index.to_series().str.split('_', expand=True)
df[0] = pd.to_datetime(df[0])
df = df.sort_values(by=[1,2,0])
df = df.drop([0,1,2], axis=1)
print (df)
a
2015-03-28_p001_2 0
2015-03-29_p001_2 1
2015-03-30_p001_2 2
2015-03-31_p001_2 3
2015-04-01_p001_2 5
2015-04-02_p001_2 7
2015-04-03_p001_2 9
2015-03-31_p002_3 4
2015-04-01_p002_3 6
2015-04-02_p002_3 8
2016-03-31_p127_1 10
2016-04-01_p127_1 11
2016-04-02_p127_1 13
2016-04-03_p127_1 15
2016-04-04_p127_1 17
2016-04-05_p127_1 18
2016-04-06_p127_1 19
2016-04-01_p128_3 12
2016-04-02_p128_3 14
2016-04-03_p128_3 16