熊猫 - 按大小总和对MultIndex进行排序

时间:2018-04-15 12:01:15

标签: python pandas sorting multi-index

我有一个Pandas DataFrame,它代表用户对不同网站的请求,而DataFrame中的每个实体都是一个请求。

为了简化问题,我们只有两列 - websiteip

数据如下:

website.com 1.1.1.1
website.com 1.1.1.1
website.com 1.1.1.1
website.com 1.1.1.1
website.com 1.1.1.2

website1.com 1.1.1.1
website1.com 1.1.1.1
website1.com 1.1.1.3
website1.com 1.1.1.3

website2.com 1.1.1.4

我想对这个DataFrame进行分组,以便我可以看到按降序排序得到最多点击的网站,并且每个网站内部都会看到访问它的IP,也是降序的。

我目前的解决方案是:

grouped_df = df.groupby(['website', 'IP'])
grouped_df.size()

会给我:

website      IP         Size

website.com  1.1.1.1    4
             1.1.1.2    1
website1.com 1.1.1.1    2
             1.1.1.3    2
website2.com 1.1.1.4    1

我可以按照grouped_df.size().sort_values(ascending=False)这样的大小对这个分组的DataFrame进行排序,但这会根据每个用户的请求数量进行排序:

website      IP         Size

website.com  1.1.1.1    4    <---- sorted by size (N of requests from IP)
website1.com 1.1.1.1    2
             1.1.1.3    2
website.com  1.1.1.2    1
website2.com 1.1.1.4    1

而不是对特定网站的请求总数:

website      IP         Size    Sum

website.com  1.1.1.1    4       5    <---- sorted by sum and sorted by size inside
             1.1.1.2    1
website1.com 1.1.1.1    2       4
             1.1.1.3    2
website2.com 1.1.1.4    1       1

我怎样才能做到这一点?

2 个答案:

答案 0 :(得分:1)

使用:

df1 = df.groupby(['website', 'IP']).size().to_frame('Size')
df1['Sum'] = df1.groupby(level=0)['Size'].transform('sum')
#alternative solution
#df1['Sum'] = df1.reset_index()['website'].map(df1.sum(level=0).squeeze()).values
df1 = df1.sort_values(['Sum','Size'],ascending=False)

print (df1)
                      Size  Sum
website      IP                
website.com  1.1.1.1     4    5
             1.1.1.2     1    5
website1.com 1.1.1.1     2    4
             1.1.1.3     2    4
website2.com 1.1.1.4     1    1

<强>解释

  1. 首先汇总size,并Series.to_frame
  2. Series转换为一列DataFrame
  3. 通过第一级GroupBy.transformSummapsum第一级分组,创建新列sum
  4. 上次sort_values

答案 1 :(得分:1)

这是一种方式。我们的想法是创建一个<html> <head> <!-- Basic Page Needs –––––––––––––––––––––––––––––––––––––––––––––––––– --> <meta charset="utf-8"> <title>fabrice.</title> <meta name="description" content="fabrice - official website"> <meta name="author" content="Daniel Pölzgutter"> <!-- Mobile Specific Metas –––––––––––––––––––––––––––––––––––––––––––––––––– --> <meta name="viewport" content="width=device-width, initial-scale=1"> <!-- CSS –––––––––––––––––––––––––––––––––––––––––––––––––– --> <link rel="stylesheet" type="text/css" href="/style.css"> <!-- favicon –––––––––––––––––––––––––––––––––––––––––––––––––– --> <link rel="shortcut icon" type="image/png" href="/res/icons/favicon.png"/> </head> <body> <!-- PAGES –––––––––––––––––––––––––––––––––––––––––––––––––– --> <!--featured--> <div id="js_featured" class="page"> <div class="container"> <div class="row"> <div class="twelve columns" style="margin-top: 5%"> <h4 align="center" style="font-family: aileron-heavy; font-size: 50px">featured</h4> </div> <div class="twelve columns"> <p align="center" style="font-family: aileron-thin; font-size: 25px">...</p> </div> </div> </div> </div> <!--releases--> <div id="js_music" class="page" style="display:none"> <div class="container"> <div class="row"> <div class="twelve columns" style="margin-top: 5%"> <h4 align="center" style="font-family: aileron-heavy; font-size: 50px">releases</h4> </div> <div class="twelve columns"> <p align="center" style="font-family: aileron-thin; font-size: 25px">...</p> </div> </div> </div> </div> <!--contact--> <div id="js_contact" class="page" style="display:none"> <div class="container"> <div class="row"> <div class="twelve columns" style="margin-top: 5%"> <h4 align="center" style="font-family: aileron-heavy; font-size: 50px">contact</h4> </div> <div class="twelve columns"> <p align="center" style="font-family: aileron-thin; font-size: 25px">...</p> </div> </div> </div> </div> <!-- NAVIGATION BAR –––––––––––––––––––––––––––––––––––––––––––––––––– --> <div id="navbar_box"> <div class="navbar"> <div id="brand_box"> <div class="brandtext">fabrice.</div> </div> <div id="textbutton_box"> <button class="textbutton" onclick="openPage('js_featured')">featured</button> <button class="textbutton" onclick="openPage('js_music')">releases</button> <button class="textbutton" onclick="openPage('js_contact')">contact</button> </div> <div id="icon_box"> <a href="https://www.facebook.com/danielfabricem/" class="icon" id="facebook" target="_blank"></a> <a href="https://twitter.com/danielfabricem" class="icon" id="twitter" target="_blank"></a> <a href="https://www.instagram.com/danielfabricem/" class="icon" id="instagram" target="_blank"></a> <a href="https://www.snapchat.com/add/danielfabricem" class="icon" id="snapchat" target="_blank"></a> <a href="https://www.youtube.com/channel/UCNGXQlfrPtou-CpHSJDiicQ" class="icon" id="youtube" target="_blank"></a> <a href="https://open.spotify.com/artist/4qpNXA2Fz4VjjzPAvoK9Uc" class="icon" id="spotify" target="_blank"></a> <a href="https://soundcloud.com/danielfabricem" class="icon" id="soundcloud" target="_blank"></a> </div> </div> </div> <!-- JAVASCRIPT –––––––––––––––––––––––––––––––––––––––––––––––––– --> <script> /*switch sites*/ function openPage(name) { var i; var x = document.getElementsByClassName("page"); for (i = 0; i < x.length; i++) { x[i].style.display = "none"; } document.getElementById(name).style.display = "block"; } </script> </body> 列,然后在流程结束时按此列排序。

violated - parent key not found