Question

我在网站上有一个/redir目录，其中.htaccess文件将各种静态地址重定向到其他地址，以便计算访问特定链接的次数。我想编写一个脚本来帮助计算数据。

我已经有两个脚本了。第一个通过cron作业在每天凌晨2:00将数据附加到access.log.0文件的log.total文件中。第二个是一个脚本，可以交互运行以运行生成计数，给定最小和最大日期。

cron脚本：

#!/bin/bash
rm -f log.tmp
grep "GET /redir/.*" access.log.0 | cut -d " " -f4,5,7 > log.tmp
cat log.tmp >> log.total
rm log.tmp

这会生成如下所示的数据：

[21/Aug/2012:00:31:27 -0700] /redir/abc.html
[21/Aug/2012:00:31:35 -0700] /redir/def.html
[21/Aug/2012:00:31:35 -0700] /redir/abc.html
[21/Aug/2012:00:31:40 -0700] /redir/ghi.html
[21/Aug/2012:00:31:46 -0700] /redir/123.html
[21/Aug/2012:00:31:58 -0700] /redir/def.html
[21/Aug/2012:00:32:07 -0700] /redir/abc.html
etc...

现在，我想要一个可以使用readLogs.sh "log.total" "1 week ago" "today"运行的脚本，它将计算一周前和今天之间访问每个文件的次数。

我已经发布了下面的脚本来完成工作，但是有一些限制，在那里概述。输出可以是任何可读格式。

Answer 1

如果将日期转换为范围比较的UNIX时间戳，则会更容易。您可以将它们作为第二个字段添加到您的文件中：

[21/Aug/2012:00:31:27 -0700] 1345534287 /redir/abc.html

（您可以使用date +%s --date "date string"获取UNIX时间戳。我假设您希望保留人类可读的时间戳，但如果需要，可以将其替换为时间戳。）

这是一个修改后的脚本，它假设您的日志文件已按建议修改;该脚本还使用bash参数扩展使其缩短：

[更新：修改为在达到结束时间戳后退出。]

#!/bin/bash

# :- means to use the RHS if the LHS is null or unset
FILE="${1:-log.total}"  
MINTIME="${2:-1 day ago}"
MAXTIME="${3:-now}"

START=$( date +%s --date "$MINTIME" )
END=$( date +%s --date "$MAXTIME" )

# No need for cut; just have awk print only the field you want
# Field 1 is the date/time
# Field 2 is the timezone
# Field 3 is the timestamp you added
# Field 4 is the path
awk -v start=$START -v end=$END '$3 > end { exit } $3 >= start {print $4}' "$FILE" | \
  sort | uniq -c | sort

Answer 2

这是我想出的脚本。限制是如果输入的日期没有出现在日志中，则它无法正常工作。例如，如果我输入“1天前”作为开始日期但是昨天没有访问，它将选择文件的开头作为开始计数的位置。

#!/bin/bash

if [ "$1" ]; then
FILE="$1"
else
FILE="log.total"
fi

#if test -t 0; then
#INPUT=`cat $FILE`
#else
#INPUT="$(cat -)"
#fi

if [ "$2" ]; then
MINTIME="$2"
else
MINTIME="1 day ago"
fi

if [ "$3" ]; then
MAXTIME="$3"
else
MAXTIME="now"
fi

START=`grep -m 1 -n $(date --date="$MINTIME" +%d/%b/%Y) "$FILE" | cut -d: -f1`
if [ -z "$START" ]; then
START=0
fi

END=`grep -m 1 -n $(date --date="$MAXTIME" +%d/%b/%Y) "$FILE" | cut -d: -f1`
if [ -z "$END" ]; then
END=`wc "$FILE" | cut -d" " -f3`
fi

awk "NR>=$START && NR<$END {print}" "$FILE" | cut -d" " -f3 | sort | uniq -c | sort

输出如下：

    1 /redir/123.html
    1 /redir/ghi.html
    2 /redir/def.html
    3 /redir/abc.html

如何使用Bash计算从Apache访问日志对文件的访问权限

2 个答案: