我试图首先按索引升序排列数据帧中的行,但是我有重复索引的行。对于这些,我希望它们在特定列中按其值升序排列。下面是我的数据框的样子:
layer row col stage conductance riverbott
row_Index
8 0.0 8.0 29.0 123.170732 1250000.0 122.170732
6 0.0 6.0 21.0 123.170732 1250000.0 122.170732
7 0.0 7.0 22.0 123.170732 1250000.0 122.170732
8 0.0 8.0 24.0 123.170732 1250000.0 122.170732
10 0.0 8.0 14.0 123.170732 1250000.0 122.170732
12 0.0 8.0 53.0 123.170732 1250000.0 122.170732
8 0.0 8.0 23.0 123.170732 1250000.0 122.170732
10 0.0 8.0 12.0 123.170732 1250000.0 122.170732
我尝试做
df = df.sort_values(['col'])
df = df.sort_index()
在我的数据框中,我的索引值等于“行”中的值。我尝试使用df = df.sort_values(['row'])
代替df.sort_index()
。但是,我一直遇到这样的问题:首先按索引对我的数据帧进行升序排序(这是需要的),但是有时重复索引行的排列方式是列值最低的列排在前,而列值最高的列排在前。例如:
layer row col stage conductance riverbott
row_Index
6 0.0 6.0 21.0 123.170732 1250000.0 122.170732
7 0.0 7.0 22.0 123.170732 1250000.0 122.170732
8 0.0 8.0 23.0 123.170732 1250000.0 122.170732
8 0.0 8.0 24.0 123.170732 1250000.0 122.170732
8 0.0 8.0 29.0 123.170732 1250000.0 122.170732
10 0.0 8.0 14.0 123.170732 1250000.0 122.170732
10 0.0 8.0 12.0 123.170732 1250000.0 122.170732
12 0.0 8.0 53.0 123.170732 1250000.0 122.170732
我希望数据框架的组织方式如下:
layer row col stage conductance riverbott
row_Index
6 0.0 6.0 21.0 123.170732 1250000.0 122.170732
7 0.0 7.0 22.0 123.170732 1250000.0 122.170732
8 0.0 8.0 23.0 123.170732 1250000.0 122.170732
8 0.0 8.0 24.0 123.170732 1250000.0 122.170732
8 0.0 8.0 29.0 123.170732 1250000.0 122.170732
10 0.0 8.0 12.0 123.170732 1250000.0 122.170732
10 0.0 8.0 14.0 123.170732 1250000.0 122.170732
12 0.0 8.0 53.0 123.170732 1250000.0 122.170732
之所以这样做,是因为我想删除重复的索引,将值最低的索引保留在'col'中。
感谢您的帮助。
答案 0 :(得分:0)
排序:
df = df.reset_index().sort_values(['row_Index', 'col']).set_index('row_Index')
输出:
layer row col stage conductance riverbott
row_Index
6 0.0 6.0 21.0 123.170732 1250000.0 122.170732
7 0.0 7.0 22.0 123.170732 1250000.0 122.170732
8 0.0 8.0 23.0 123.170732 1250000.0 122.170732
8 0.0 8.0 24.0 123.170732 1250000.0 122.170732
8 0.0 8.0 29.0 123.170732 1250000.0 122.170732
10 0.0 8.0 12.0 123.170732 1250000.0 122.170732
10 0.0 8.0 14.0 123.170732 1250000.0 122.170732
12 0.0 8.0 53.0 123.170732 1250000.0 122.170732
然后删除重复项:
df.loc[~df.index.duplicated(keep='first')]