熊猫索引 - 按数字子字符串对字符串索引进行排序

时间:2021-04-22 20:41:02

标签: python pandas

在 Python 3.8 中使用 Pandas。

给定一个如下所示的字符串值索引:

import pandas as pd

foo = pd.Index(['score_1', 'score_10', 'score_11', 'score_12', 'score_13', 'score_14',
       'score_15', 'score_16', 'score_17', 'score_18', 'score_19', 'score_2',
       'score_20', 'score_21', 'score_22', 'score_23', 'score_24', 'score_25',
       'score_26', 'score_27', 'score_3', 'score_4', 'score_5', 'score_6',
       'score_7', 'score_8', 'score_9'],
      dtype='object', name='score_field')

排序的“正确”方法是什么,以便值按数字顺序排列,例如:'score_1', 'score_2' ... 'score_9', 'score_10', 等...?

这不起作用:

foo.sort_values(key=lambda x: int(x.split('_')[1]))
AttributeError: 'Index' object has no attribute 'split'

这不起作用:

foo.sort_values(key=lambda val: val.str.split('_').str[1].astype(int))
AttributeError: Can only use .str accessor with string values!

确实有效,但感觉很丑:

foo = pd.Index(sorted(foo.to_list(), key=lambda x: int(x.split('_')[1])),
      dtype=foo.dtype, name=foo.name)

1 个答案:

答案 0 :(得分:1)

老实说,你所拥有的对我来说很有意义,但是,如果你想使用纯 Pandas 的方式,请使用 Index.str.splitargsort

foo[foo.str.split('_').str[1].astype(int).argsort()]

Index(['score_1', 'score_2', 'score_3', 'score_4', 'score_5', 'score_6',
   'score_7', 'score_8', 'score_9', 'score_10', 'score_11', 'score_12',
   'score_13', 'score_14', 'score_15', 'score_16', 'score_17', 'score_18',
   'score_19', 'score_20', 'score_21', 'score_22', 'score_23', 'score_24',
   'score_25', 'score_26', 'score_27'],
  dtype='object', name='score_field')

或者,如果您适合第 3 方库:

import natsort as ns
pd.Index(ns.natsorted(foo),name=foo.name)