如何以特定方式按特定列对熊猫数据框的值进行排序(使用lambda函数,如在std lib中排序)

时间:2020-11-08 17:42:19

标签: python pandas dataframe

给出以下数据:

       bit   val
0    bit_3  37.7
1   bit_16  36.7
2    bit_6  40.6
3   bit_10  48.4
4    bit_2  50.5
5   bit_14  40.8
6    bit_4  52.0
7   bit_17  50.8
8    bit_7  37.8
9    bit_1  49.6
10  bit_13  46.7
11   bit_0  40.9
12  bit_19  41.3
13  bit_18  41.6
14   bit_9  51.1
15  bit_15  41.1
16   bit_8  39.2
17  bit_12  51.7
18  bit_11  49.8
19   bit_5  55.1

其外观为:

bit

我想根据尾随数字按sorted(df["bit"].to_list(), key=lambda x: int(x.split("_")[-1])) 列对数据进行排序。

如果这是标准的python列表,则可以执行以下操作:

{{1}}

我不确定如何将其应用于数据框。

5 个答案:

答案 0 :(得分:2)

尝试使用natsort

from natsort import index_natsorted
df = df.iloc[index_natsorted(df.bit)]
df
Out[195]: 
       bit   val
11   bit_0  40.9
9    bit_1  49.6
4    bit_2  50.5
0    bit_3  37.7
6    bit_4  52.0
19   bit_5  55.1
2    bit_6  40.6
8    bit_7  37.8
16   bit_8  39.2
14   bit_9  51.1
3   bit_10  48.4
18  bit_11  49.8
17  bit_12  51.7
10  bit_13  46.7
5   bit_14  40.8
15  bit_15  41.1
1   bit_16  36.7
7   bit_17  50.8
13  bit_18  41.6
12  bit_19  41.3

答案 1 :(得分:1)

使用df.sort_values.str.split("_",expand=True)并使用.astype(int)强制转换为整数:

df.sort_values('bit',key=lambda x: x.str.split("_",expand=True)[1].astype(int))

输出:

       bit   val
11   bit_0  40.9
9    bit_1  49.6
4    bit_2  50.5
0    bit_3  37.7
6    bit_4  52.0
19   bit_5  55.1
2    bit_6  40.6
8    bit_7  37.8
16   bit_8  39.2
14   bit_9  51.1
3   bit_10  48.4
18  bit_11  49.8
17  bit_12  51.7
10  bit_13  46.7
5   bit_14  40.8
15  bit_15  41.1
1   bit_16  36.7
7   bit_17  50.8
13  bit_18  41.6
12  bit_19  41.3

如果您需要重置索引,只需添加.reset_index(drop=True)

df.sort_values('bit',key=lambda x: x.str.split("_",expand=True)[1].astype(int)).reset_index(drop=True)

输出:

       bit   val
0    bit_0  40.9
1    bit_1  49.6
2    bit_2  50.5
3    bit_3  37.7
4    bit_4  52.0
5    bit_5  55.1
6    bit_6  40.6
7    bit_7  37.8
8    bit_8  39.2
9    bit_9  51.1
10  bit_10  48.4
11  bit_11  49.8
12  bit_12  51.7
13  bit_13  46.7
14  bit_14  40.8
15  bit_15  41.1
16  bit_16  36.7
17  bit_17  50.8
18  bit_18  41.6
19  bit_19  41.3

答案 2 :(得分:1)

使用 pandas> = 1.1.0 ,您可以像在sorted中一样使用key
在我的解决方案中,我对bit列进行排序,但是对于排序,我抛出了bit_

df.sort_values(
    by='bit', 
    key=lambda x: x.str.replace('bit_', '').astype(int),
)

    bit     val
11  bit_0   40.9
9   bit_1   49.6
4   bit_2   50.5
0   bit_3   37.7
6   bit_4   52.0

.sort_values()上的文档:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

答案 3 :(得分:0)

一种高效的方法是创建一个按您希望的方式排序的系列,然后将该索引传递给数据框:

# create series of bit integers, sort them
bit_vals = df.bit.str.split("_", expand=True).loc[:, 1].astype(int)
sort_series = bit_vals.sort_values()    

# pass back to dataframe
df = df.iloc[sort_series.index]

结果:

       bit   val
11   bit_0  40.9
9    bit_1  49.6
4    bit_2  50.5
0    bit_3  37.7
6    bit_4  52.0
19   bit_5  55.1
2    bit_6  40.6
8    bit_7  37.8
16   bit_8  39.2
14   bit_9  51.1
3   bit_10  48.4
18  bit_11  49.8
17  bit_12  51.7
10  bit_13  46.7
5   bit_14  40.8
15  bit_15  41.1
1   bit_16  36.7
7   bit_17  50.8
13  bit_18  41.6
12  bit_19  41.3

您可以根据需要重置数据框索引

答案 4 :(得分:0)

您可以将str.extractSeries.argsortdf.loc结合使用:

In [1038]: ix = df.bit.str.extract('(\d+)', expand=False).astype(int).argsort().tolist()

In [1039]: df.loc[ix]
Out[1039]: 
       bit   val
11   bit_0  40.9
9    bit_1  49.6
4    bit_2  50.5
0    bit_3  37.7
6    bit_4  52.0
19   bit_5  55.1
2    bit_6  40.6
8    bit_7  37.8
16   bit_8  39.2
14   bit_9  51.1
3   bit_10  48.4
18  bit_11  49.8
17  bit_12  51.7
10  bit_13  46.7
5   bit_14  40.8
15  bit_15  41.1
1   bit_16  36.7
7   bit_17  50.8
13  bit_18  41.6
12  bit_19  41.3