我应该使用什么数据库表格布局来快速检索日期范围的聚合/不同数据?

时间:2011-10-05 18:30:07

标签: database schema

我正在编写一个Web应用程序来分析我的Web服务器日志。

我计划每天运行一个SQL作业,以便在SQL数据库中对我的Web服务器日志进行非规范化,这样Web应用程序就不会读取原始Web服务器日志。

我希望网络应用用户输入日期范围,然后让网络应用返回:

  • 一个表格,其中包含一列中的每个浏览器,下一列中包含该日期范围内唯一客户端IP的数量
  • 在一列中包含每个操作系统的表,在下一列中包含该日期范围内唯一客户端IP的数量
  • 在一列中包含每个浏览器+操作系统的表格,在下一列中包含该日期范围内唯一客户端IP的数量

(您可以在Google Analytics中看到这个想法。)

我们每个月有大约100,000个唯一的客户端IP,我希望将非规范化数据保存一年(尽管这些客户端IP中的许多都将是相同的月份)。

  1. 放置非规范化信息的表格布局是什么?
  2. 有效检索所需信息的Web应用程序的SQL查询是什么?
  3. (我不是问如何让SQL作业写入这些表;我可以理解这一点。)

1 个答案:

答案 0 :(得分:0)

我将总结SQL-job的访问次数,并将每日结果放入表中,如[logsum]:

Table [logsum]:
sum_id (int / auto_increment)
sum_day (date)
sum_name (string)
sum_count (number)

和非规范化日志数据到[logaccess]:

Table [logaccess]:
access_id (int / auto_increment)
access_day (date)
access_ip (string)
access_browser (string)
access_os (string)
access_click_count (int)

<强> SQL-作业:

1)将所有日志条目添加到[Loginfo]并按IP和日加总点击次数

for each line in log
{
  info = parse(line)
  execute_sql('REPLACE logaccess
               SET 
                 access_day=date(),
                 access_ip='& info[IP] &',
                 access_browser'& info[browser] &',
                 access_os='& info[OS] &',
                 access_click_count=IF(ISNULL(access_click_count),0,access_click_count) + 1) 
               WHERE access_day=date() AND access_ip='& info[IP] &';')
}

2)总结并保存到[logsum]:

//- get OS count per day
res = execute_sql('SELECT access_day, access_os, count(access_id) AS C FROM logaccess GROUP BY access_day, access_os;');

//- write to [logsum]
for each record in res
{
   REPLACE logsum SET 
     sum_day=record['access_day'],
     sum_name=record['access_os'],
     sum_count=record['c']
   WHERE sum_day=record['access_day'] AND sum_name=record['access_os'];
}


//- get browser count per Day
res = execute_sql('SELECT access_day, access_browser, count(access_id) AS C FROM logaccess GROUP BY access_day, access_browser;');

//- write to [logsum]
for each record in res
{
   REPLACE logsum SET 
     sum_day=record['access_day'],
     sum_name=record['access_browser'],
     sum_count=record['c']
   WHERE sum_day=record['access_day'] AND sum_name=record['access_browser'];
}

//- get IP count per Day
res = execute_sql('SELECT access_day, count(access_id) AS C FROM logaccess GROUP BY access_day;')

//- write to [logsum]
for each record in res
{
   REPLACE logsum SET 
     sum_day=record['access_day'],
     sum_name='ip',
     sum_count=record['c']
   WHERE sum_day=record['access_day'] AND sum_name='ip';
}

//- get click count per Day
res = execute_sql('SELECT access_day, sum(access_click_count) AS C FROM logaccess GROUP BY access_day;')

//- write to [logsum]
for each record in res
{
   REPLACE logsum SET 
     sum_day=record['access_day'],
     sum_name='clicks',
     sum_count=record['c']
   WHERE sum_day=record['access_day'] AND sum_name='clicks';
}

3)清理/删除1年前的旧版:

DELETE FROM logaccess WHERE access_day<DATE_ADD(date(),INTERVAL 1 year)

也许这会对你有所帮助 关心托马斯