如何根据自定义顺序对熊猫多索引数据框的索引进行排序

时间:2021-07-27 20:28:49

标签: pandas dataframe multi-index

以下是生成示例数据帧的一些代码:

fruits=pd.DataFrame()
fruits['month']=['jan','feb','feb','march','jan','april','april','june','march','march','june','april']
ind_mnth=fruits['month'].values
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
ind_fruit=fruits['fruit'].values
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
fruits_grp = fruits.set_index([ind_mnth, ind_fruit],drop=False)

如何对这个多索引数据帧的行进行排序,使得每个外部索引(月份)下的内部索引(水果)按照自定义顺序进行排序,并且具有相同外部索引的行被分组在一起。< /p>

1 个答案:

答案 0 :(得分:0)

一种方法是按照你想要的顺序创建一个categorical系列的水果列,然后fn main() -> Result<(), Box<dyn std::error::Error>> { let input_path = match std::env::args_os().nth(1) { Some(p) => p, None => { eprintln!("Usage: csvmem <path>"); std::process::exit(1); } }; let mut count = 0; let rdr = csv::Reader::from_path(input_path)?; for result in rdr.into_records() { let _ = result?; count += 1; } println!("{}", count); Ok(()) } 每个级别都有一个set_indexMultiindex.from_arrays像你一样做了

sort_index

请注意,# custom order ord_fruit = ['apple', 'pear', 'cherry', 'orange'] # create a ordered Categorical series for the fruits f = pd.Categorical(fruits['fruit'], categories=ord_fruit, ordered=True) # get month values, could also be a custom order same idea than above m = fruits['month'].to_numpy() # get the result fruits_grp = fruits.set_index(pd.MultiIndex.from_arrays([m,f])).sort_index() print(fruits_grp) month fruit price april pear april pear 45 # pear before cherry cherry april cherry 60 cherry april cherry 60 feb pear feb pear 40 orange feb orange 20 jan apple jan apple 30 apple jan apple 30 june apple june apple 37 pear june pear 45 march cherry march cherry 55 # cherry before orange orange march orange 25 orange march orange 25 将按照字母顺序对其他级别进行排序,如果您不希望这样,您可以为每个级别创建自己的顺序。