我有一个Pandas DataFrame,它代表用户对不同网站的请求,而DataFrame中的每个实体都是一个请求。
为了简化问题,我们只有两列 - website
和ip
。
数据如下:
website.com 1.1.1.1
website.com 1.1.1.1
website.com 1.1.1.1
website.com 1.1.1.1
website.com 1.1.1.2
website1.com 1.1.1.1
website1.com 1.1.1.1
website1.com 1.1.1.3
website1.com 1.1.1.3
website2.com 1.1.1.4
我想对这个DataFrame进行分组,以便我可以看到按降序排序得到最多点击的网站,并且每个网站内部都会看到访问它的IP,也是降序的。
我目前的解决方案是:
grouped_df = df.groupby(['website', 'IP'])
grouped_df.size()
会给我:
website IP Size
website.com 1.1.1.1 4
1.1.1.2 1
website1.com 1.1.1.1 2
1.1.1.3 2
website2.com 1.1.1.4 1
我可以按照grouped_df.size().sort_values(ascending=False)
这样的大小对这个分组的DataFrame进行排序,但这会根据每个用户的请求数量进行排序:
website IP Size
website.com 1.1.1.1 4 <---- sorted by size (N of requests from IP)
website1.com 1.1.1.1 2
1.1.1.3 2
website.com 1.1.1.2 1
website2.com 1.1.1.4 1
而不是对特定网站的请求总数:
website IP Size Sum
website.com 1.1.1.1 4 5 <---- sorted by sum and sorted by size inside
1.1.1.2 1
website1.com 1.1.1.1 2 4
1.1.1.3 2
website2.com 1.1.1.4 1 1
我怎样才能做到这一点?
答案 0 :(得分:1)
使用:
df1 = df.groupby(['website', 'IP']).size().to_frame('Size')
df1['Sum'] = df1.groupby(level=0)['Size'].transform('sum')
#alternative solution
#df1['Sum'] = df1.reset_index()['website'].map(df1.sum(level=0).squeeze()).values
df1 = df1.sort_values(['Sum','Size'],ascending=False)
print (df1)
Size Sum
website IP
website.com 1.1.1.1 4 5
1.1.1.2 1 5
website1.com 1.1.1.1 2 4
1.1.1.3 2 4
website2.com 1.1.1.4 1 1
<强>解释强>:
size
,并Series.to_frame
Series
转换为一列DataFrame
GroupBy.transform
和Sum
或map
按sum
第一级分组,创建新列sum
sort_values
答案 1 :(得分:1)
这是一种方式。我们的想法是创建一个<html>
<head>
<!-- Basic Page Needs
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<meta charset="utf-8">
<title>fabrice.</title>
<meta name="description" content="fabrice - official website">
<meta name="author" content="Daniel Pölzgutter">
<!-- Mobile Specific Metas
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- CSS
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<link rel="stylesheet" type="text/css" href="/style.css">
<!-- favicon
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<link rel="shortcut icon" type="image/png" href="/res/icons/favicon.png"/>
</head>
<body>
<!-- PAGES
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<!--featured-->
<div id="js_featured" class="page">
<div class="container">
<div class="row">
<div class="twelve columns" style="margin-top: 5%">
<h4 align="center" style="font-family: aileron-heavy; font-size: 50px">featured</h4>
</div>
<div class="twelve columns">
<p align="center" style="font-family: aileron-thin; font-size: 25px">...</p>
</div>
</div>
</div>
</div>
<!--releases-->
<div id="js_music" class="page" style="display:none">
<div class="container">
<div class="row">
<div class="twelve columns" style="margin-top: 5%">
<h4 align="center" style="font-family: aileron-heavy; font-size: 50px">releases</h4>
</div>
<div class="twelve columns">
<p align="center" style="font-family: aileron-thin; font-size: 25px">...</p>
</div>
</div>
</div>
</div>
<!--contact-->
<div id="js_contact" class="page" style="display:none">
<div class="container">
<div class="row">
<div class="twelve columns" style="margin-top: 5%">
<h4 align="center" style="font-family: aileron-heavy; font-size: 50px">contact</h4>
</div>
<div class="twelve columns">
<p align="center" style="font-family: aileron-thin; font-size: 25px">...</p>
</div>
</div>
</div>
</div>
<!-- NAVIGATION BAR
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<div id="navbar_box">
<div class="navbar">
<div id="brand_box">
<div class="brandtext">fabrice.</div>
</div>
<div id="textbutton_box">
<button class="textbutton" onclick="openPage('js_featured')">featured</button>
<button class="textbutton" onclick="openPage('js_music')">releases</button>
<button class="textbutton" onclick="openPage('js_contact')">contact</button>
</div>
<div id="icon_box">
<a href="https://www.facebook.com/danielfabricem/" class="icon" id="facebook" target="_blank"></a>
<a href="https://twitter.com/danielfabricem" class="icon" id="twitter" target="_blank"></a>
<a href="https://www.instagram.com/danielfabricem/" class="icon" id="instagram" target="_blank"></a>
<a href="https://www.snapchat.com/add/danielfabricem" class="icon" id="snapchat" target="_blank"></a>
<a href="https://www.youtube.com/channel/UCNGXQlfrPtou-CpHSJDiicQ" class="icon" id="youtube" target="_blank"></a>
<a href="https://open.spotify.com/artist/4qpNXA2Fz4VjjzPAvoK9Uc" class="icon" id="spotify" target="_blank"></a>
<a href="https://soundcloud.com/danielfabricem" class="icon" id="soundcloud" target="_blank"></a>
</div>
</div>
</div>
<!-- JAVASCRIPT
–––––––––––––––––––––––––––––––––––––––––––––––––– -->
<script>
/*switch sites*/
function openPage(name) {
var i;
var x = document.getElementsByClassName("page");
for (i = 0; i < x.length; i++) {
x[i].style.display = "none";
}
document.getElementById(name).style.display = "block";
}
</script>
</body>
列,然后在流程结束时按此列排序。
violated - parent key not found