构建网络分析查询。此查询的目的是查找查看特定网页的每个会话的平均网页浏览量,以便我们可以报告以下数据:
等等。
我正在使用HyperSQL DB。所有数据都是一个基本上如下所示的表:
session_id | event | page_id
1 | 'page load' | 1
1 | 'user action' | 1
1 | 'page load' | 2
2 | 'page load' | 1
3 | 'page load' | 1
3 | 'page load' | 2
3 | 'user action' | 2
3 | 'page load' | 3
... etc ...
到目前为止,在我的查询/尝试中,我正在按PageID进行分组。我需要获取引用此初始页面ID集的会话ID,然后再次查询以获取我的新会话ID集引用的所有页面ID。
那么,我想AVG这一组会话ID的'页面加载'事件。
有意义吗? 我已经尝试了很多东西,但是对SQL来说缺乏经验,我无法破解它。我尝试了一些内连接和一些子查询(这给了我基数违规)。
更新 所需的输出看起来像:
page_id | sessions_including_this_page | avg_pages_per_session
1 | 2 | 2.1
2 | 4 | 1.7
谢谢!
更新2 如果我在服务器端Javascript中这样做,它看起来像这样:
var events = {
{ session_id: 1, event: 'page_load', page_id:1 },
{ session_id: 1, event: 'page_load', page_id:2 },
{ session_id: 1, event: 'page_load', page_id:3 },
{ session_id: 2, event: 'page_load', page_id:1 },
{ session_id: 3, event: 'page_load', page_id:1 },
{ session_id: 3, event: 'page_load', page_id:2 }
};
// get session IDs that loaded page_id = 2
var sessions_viewing_page2 = []; // array to store session IDs
for ( var i in events ) {
if ( events[i].page_id === 2 ) sessions_viewing_page2.push( events[i].session_id );
}
// so now: sessions_viewing_page2 = [1,3];
// get total page loads for those sessions that viewed page_id==2
// we'll iterate through events again
// and check if a session ID is in our array
var pageloads_per_session = {}; // obj to store page load counts by session ID
for (var j in events) {
if ( sessions_viewing_page2.indexOf( events[j].session_id ) != -1 ) {
// are we already incrementing this session ID?
if ( !pageloads_per_session[events[j].session_id] ) pageloads_per_session[events[j] = 1;
else pageloads_per_session[events[j]++;
}
}
// this gives us
// pageloads_per_session[1] = 3;
// pageloads_per_session[3] = 2;
// then, since I know each session_id in pageloads_per_session viewed page_id==2... I can calculate "average page loads per session that viewed page_id == 2".
// in this case... we have 2 distinct sessions (1,3), and 5 total page loads (3+2)... for an average of 2.5 page loads per session that included page_id == 2.
// quite a mouthful. thanks!
`
答案 0 :(得分:1)
我认为这就是你想要的:
select a.page_id, a.num_ses, avg(c.num_pg_ld_sespg) as avg_ses_pg_exist
from (select page_id, count(distinct session_id) as num_ses
from tbl
where event = 'page load'
group by page_id) a
join (select session_id, count(*) as num_pg_ld_ses
from tbl
where event = 'page load'
group by session_id) b
join (select session_id, page_id, count(*) as num_pg_ld_sespg
from tbl
where event = 'page load'
group by session_id, page_id) c
on a.page_id = c.page_id
and b.session_id = c.session_id
group by a.page_id, a.num_ses
order by a.page_id
请参阅:http://sqlfiddle.com/#!2/d79a2/1/0
上的sqlfiddle测试请注意,我添加了除示例数据之外的一行: 插入到tbl值(2,'page load',1);
因为示例数据的平均值 - 在第3列中 - 为1。
我正在计算第3列的平均值,作为每个会话的平均页面加载数,其中会话在给定行上的页面至少有一个页面加载,但该语句的“页面加载数”部分考虑所有页面加载,而不仅仅是给定行上页面的加载。