我有一个文本日志文件,包含由“|”
分隔的数据行例如
date | time | ip | geo-location (city) | page viewed ......
我需要在文本文件中找到10个最常出现的“页面浏览量”....
页面视图的每个日志都列为:
//pageurl
因为日志在单独的行上我假设我将搜索之间的页面网址
// [url name] \r\n
如何编码搜索以列出前10个网址并将其列入数组....
例如:
$url[0] <<this would be the most occuring url
$url[1] <<thos would be the second most occuring url
依此类推.....直到我可以列出它们:
$url[9] <<which would be the 10th most common url
我不确定如何在“//”和“\ r \ n”之间搜索
然后将十大最常见的偶然事件转换为数组....
事先感谢您的帮助:)
编辑:这是我的日志的2行,只是为了帮助更多,如果我可以
sunday, january 22, 2012 | 16:14:36 | 82.**.***.*** | bolton | //error
sunday, january 22, 2012 | 17:12:52 | 82.**.***.*** | bolton | //videos
感谢
答案 0 :(得分:0)
根据所提供的信息,这是一种相当天真的方法:
/* get the contents of the log file */
$log_file = file_get_contents(__DIR__.'/log.txt');
/* split the log into an array of lines */
$log_lines = explode(PHP_EOL, $log_file);
/* we don't need the log file anymore, so free up some memory */
unset($log_file);
/* loop through each line */
$page_views = array();
foreach ($log_lines as $line) {
/* get the text after the last pipe character (the page view), minus the ' //' */
$page_views[] = ltrim(array_pop(explode('|', $line)), ' /');
}
/* we don't need the array of lines either, so free up that memory */
unset($log_lines);
/* count the frequency of each unique occurrence */
$urls = array_count_values($page_views);
/* sort highest to lowest (may be redundant, I think array_count_values does this) */
arsort($urls, SORT_NUMERIC);
print_r($urls);
/* [page_url] => num page views, ... */
/* that gives you occurrences, but you want a numerical
indexed array for a top ten, so... */
$top_ten = array();
$i = 0;
/* loop through the array, and store the keys in a new one until we have 10 of them */
foreach ($urls as $url => $views) {
if ($i >= 10) break;
$top_ten[] = $url;
$i++;
}
print_r($top_ten);
/* [0] => page url, ... */
**脚本输出:**
Array
(
[videos] => 1
[error ] => 1
)
Array
(
[0] => videos
[1] => error
)
这不是最佳解决方案,日志文件越大,所需的时间就越长。为此,您最好登录数据库并从中查询。