在 Python 3.8 中使用 Pandas。
给定一个如下所示的字符串值索引:
import pandas as pd
foo = pd.Index(['score_1', 'score_10', 'score_11', 'score_12', 'score_13', 'score_14',
'score_15', 'score_16', 'score_17', 'score_18', 'score_19', 'score_2',
'score_20', 'score_21', 'score_22', 'score_23', 'score_24', 'score_25',
'score_26', 'score_27', 'score_3', 'score_4', 'score_5', 'score_6',
'score_7', 'score_8', 'score_9'],
dtype='object', name='score_field')
排序的“正确”方法是什么,以便值按数字顺序排列,例如:'score_1', 'score_2' ... 'score_9', 'score_10',
等...?
这不起作用:
foo.sort_values(key=lambda x: int(x.split('_')[1]))
AttributeError: 'Index' object has no attribute 'split'
这不起作用:
foo.sort_values(key=lambda val: val.str.split('_').str[1].astype(int))
AttributeError: Can only use .str accessor with string values!
这确实有效,但感觉很丑:
foo = pd.Index(sorted(foo.to_list(), key=lambda x: int(x.split('_')[1])),
dtype=foo.dtype, name=foo.name)
答案 0 :(得分:1)
老实说,你所拥有的对我来说很有意义,但是,如果你想使用纯 Pandas 的方式,请使用 Index.str.split
和 argsort
:
foo[foo.str.split('_').str[1].astype(int).argsort()]
Index(['score_1', 'score_2', 'score_3', 'score_4', 'score_5', 'score_6',
'score_7', 'score_8', 'score_9', 'score_10', 'score_11', 'score_12',
'score_13', 'score_14', 'score_15', 'score_16', 'score_17', 'score_18',
'score_19', 'score_20', 'score_21', 'score_22', 'score_23', 'score_24',
'score_25', 'score_26', 'score_27'],
dtype='object', name='score_field')
或者,如果您适合第 3 方库:
import natsort as ns
pd.Index(ns.natsorted(foo),name=foo.name)