内部加入与AVG相同的表

时间:2014-03-30 17:36:03

标签: sql subquery inner-join average hsqldb

构建网络分析查询。此查询的目的是查找查看特定网页的每个会话的平均网页浏览量,以便我们可以报告以下数据:

  • “加载我们主页的会话平均为2.1页”
  • “加载特定文章的会话平均为2.4页”

等等。

我正在使用HyperSQL DB。所有数据都是一个基本上如下所示的表:

session_id | event | page_id 1 | 'page load' | 1 1 | 'user action' | 1 1 | 'page load' | 2 2 | 'page load' | 1 3 | 'page load' | 1 3 | 'page load' | 2 3 | 'user action' | 2 3 | 'page load' | 3 ... etc ...

到目前为止,在我的查询/尝试中,我正在按PageID进行分组。我需要获取引用此初始页面ID集的会话ID,然后再次查询以获取我的新会话ID集引用的所有页面ID。

那么,我想AVG这一组会话ID的'页面加载'事件。

有意义吗? 我已经尝试了很多东西,但是对SQL来说缺乏经验,我无法破解它。我尝试了一些内连接和一些子查询(这给了我基数违规)。

更新 所需的输出看起来像:

page_id | sessions_including_this_page | avg_pages_per_session 1 | 2 | 2.1 2 | 4 | 1.7

谢谢!

更新2 如果我在服务器端Javascript中这样做,它看起来像这样:

var events = {
{ session_id:  1,  event: 'page_load', page_id:1 },
{ session_id:  1,  event: 'page_load', page_id:2 },
{ session_id:  1,  event: 'page_load', page_id:3 },
{ session_id:  2,  event: 'page_load', page_id:1 },
{ session_id:  3,  event: 'page_load', page_id:1 },
{ session_id:  3,  event: 'page_load', page_id:2 }
};

// get session IDs that loaded page_id = 2
var sessions_viewing_page2 = [];  // array to store session IDs
for ( var i in events ) {
    if ( events[i].page_id === 2 ) sessions_viewing_page2.push( events[i].session_id );
}
// so now:  sessions_viewing_page2 = [1,3];

// get total page loads for those sessions that viewed page_id==2
// we'll iterate through events again
// and check if a session ID is in our array
var pageloads_per_session = {}; // obj to store page load counts by session ID
for (var j in events) {
  if ( sessions_viewing_page2.indexOf( events[j].session_id ) != -1 ) {
    // are we already incrementing this session ID?    
    if ( !pageloads_per_session[events[j].session_id] ) pageloads_per_session[events[j] = 1;
    else pageloads_per_session[events[j]++; 
  }
}
// this gives us
// pageloads_per_session[1] = 3;
// pageloads_per_session[3] = 2;

// then, since I know each session_id in pageloads_per_session viewed page_id==2... I can calculate "average page loads per session that viewed page_id == 2".
// in this case... we have 2 distinct sessions (1,3), and 5 total page loads (3+2)... for an average of 2.5 page loads per session that included page_id == 2.

// quite a mouthful.  thanks!

`

1 个答案:

答案 0 :(得分:1)

我认为这就是你想要的:

select a.page_id, a.num_ses, avg(c.num_pg_ld_sespg) as avg_ses_pg_exist
  from (select page_id, count(distinct session_id) as num_ses
          from tbl
        where event = 'page load'
         group by page_id) a
  join (select session_id, count(*) as num_pg_ld_ses
          from tbl
          where event = 'page load'
         group by session_id) b
  join (select session_id, page_id, count(*) as num_pg_ld_sespg
          from tbl
        where event = 'page load'
         group by session_id, page_id) c
    on a.page_id = c.page_id
   and b.session_id = c.session_id
 group by a.page_id, a.num_ses
 order by a.page_id

请参阅:http://sqlfiddle.com/#!2/d79a2/1/0

上的sqlfiddle测试

请注意,我添加了除示例数据之外的一行: 插入到tbl值(2,'page load',1);

因为示例数据的平均值 - 在第3列中 - 为1。

我正在计算第3列的平均值,作为每个会话的平均页面加载数,其中会话在给定行上的页面至少有一个页面加载,但该语句的“页面加载数”部分考虑所有页面加载,而不仅仅是给定行上页面的加载。