在数组中搜索最常见的事件

时间:2012-01-22 16:44:20

标签: php arrays url search text-files

我有一个文本日志文件,包含由“|”

分隔的数据行

例如

date | time | ip | geo-location (city) | page viewed ......

我需要在文本文件中找到10个最常出现的“页面浏览量”....

页面视图的每个日志都列为:

//pageurl 

因为日志在单独的行上我假设我将搜索之间的页面网址

// [url name] \r\n

如何编码搜索以列出前10个网址并将其列入数组....

例如:

$url[0]  <<this would be the most occuring url
$url[1]  <<thos would be the second most occuring url

依此类推.....直到我可以列出它们:

$url[9]  <<which would be the 10th most common url

我不确定如何在“//”和“\ r \ n”之间搜索

然后将十大最常见的偶然事件转换为数组....

事先感谢您的帮助:)

编辑:这是我的日志的2行,只是为了帮助更多,如果我可以

sunday, january 22, 2012 | 16:14:36 | 82.**.***.*** | bolton | //error 
sunday, january 22, 2012 | 17:12:52 | 82.**.***.*** | bolton | //videos

感谢

1 个答案:

答案 0 :(得分:0)

根据所提供的信息,这是一种相当天真的方法:

/* get the contents of the log file */
$log_file = file_get_contents(__DIR__.'/log.txt');

/* split the log into an array of lines */
$log_lines = explode(PHP_EOL, $log_file);

/* we don't need the log file anymore, so free up some memory */
unset($log_file);

/* loop through each line */
$page_views = array();
foreach ($log_lines as $line) {
    /* get the text after the last pipe character (the page view), minus the ' //' */
    $page_views[] = ltrim(array_pop(explode('|', $line)), ' /');
}

/* we don't need the array of lines either, so free up that memory */
unset($log_lines);

/* count the frequency of each unique occurrence */
$urls = array_count_values($page_views);

/* sort highest to lowest (may be redundant, I think array_count_values does this) */
arsort($urls, SORT_NUMERIC);

print_r($urls);
/* [page_url] => num page views, ... */

/* that gives you occurrences, but you want a numerical 
   indexed array for a top ten, so... */

$top_ten = array();
$i = 0;
/* loop through the array, and store the keys in a new one until we have 10 of them */
foreach ($urls as $url => $views) {
  if ($i >= 10) break;
  $top_ten[] = $url;
  $i++;
}

print_r($top_ten);
/* [0] => page url, ... */

**脚本输出:**

Array
(
    [videos] => 1
    [error ] => 1
)
Array
(
    [0] => videos
    [1] => error 
)

这不是最佳解决方案,日志文件越大,所需的时间就越长。为此,您最好登录数据库并从中查询。