我正在尝试如下模拟Hirarical Index dataFrame:
setTimeout(function() {
var vid = document.getElementById("myVideo");
vid.play();
}, 1000);
下面是DataFrame的外观,所以datFrame带有默认的索引。
>>> raw_data = ({'city': ['Delhi', 'Kanpur', 'Mumbai', 'Pune','Delhi', 'Kanpur', 'Mumbai', 'Pune'],
... 'rank': ['1st', '2nd', '1st', '2nd','1st', '2nd', '1st', '2nd'],
... 'name': ['Ramesh', 'Kirpal', 'Jungi', 'Sanju','Ramesh', 'Kirpal', 'Jungi', 'Sanju'],
... 'score1': [10,15,20,25,10,15,20,25],
... 'score2': [20,35,40,45,20,35,40,45]})
我想通过使用>>> df = pd.DataFrame(raw_data, columns = ['city', 'rank', 'name', 'score1', 'score2'])
>>> df
city rank name score1 score2
0 Delhi 1st Ramesh 10 20
1 Kanpur 2nd Kirpal 15 35
2 Mumbai 1st Jungi 20 40
3 Pune 2nd Sanju 25 45
4 Delhi 1st Ramesh 10 20
5 Kanpur 2nd Kirpal 15 35
6 Mumbai 1st Jungi 20 40
7 Pune 2nd Sanju 25 45
方法选择'city', 'rank'
列来使用层次索引,同时保持原始列不变。
set.index
但是我希望首先使用>>> df.set_index(['city', 'rank'], drop=False)
city rank name score1 score2
city rank
Delhi 1st Delhi 1st Ramesh 10 20
Kanpur 2nd Kanpur 2nd Kirpal 15 35
Mumbai 1st Mumbai 1st Jungi 20 40
Pune 2nd Pune 2nd Sanju 25 45
Delhi 1st Delhi 1st Ramesh 10 20
Kanpur 2nd Kanpur 2nd Kirpal 15 35
Mumbai 1st Mumbai 1st Jungi 20 40
Pune 2nd Pune 2nd Sanju 25 45
索引,然后再使用city
索引:
rank
答案 0 :(得分:2)
您快到了,只需要申请sort_index()
:
df.set_index(['city','rank'], drop=False).sort_index()
收益:
city rank name score1 score2
city rank
Delhi 1st Delhi 1st Ramesh 10 20
1st Delhi 1st Ramesh 10 20
Kanpur 2nd Kanpur 2nd Kirpal 15 35
2nd Kanpur 2nd Kirpal 15 35
Mumbai 1st Mumbai 1st Jungi 20 40
1st Mumbai 1st Jungi 20 40
Pune 2nd Pune 2nd Sanju 25 45
2nd Pune 2nd Sanju 25 45
要删除重复的行,请添加drop_duplicates()
:
df.set_index(['city','rank'], drop=False).sort_index().drop_duplicates()
收益:
city rank name score1 score2
city rank
Delhi 1st Delhi 1st Ramesh 10 20
Kanpur 2nd Kanpur 2nd Kirpal 15 35
Mumbai 1st Mumbai 1st Jungi 20 40
Pune 2nd Pune 2nd Sanju 25 45