如何使用awk解析AmazonS3日志?

时间:2015-04-23 18:44:33

标签: bash amazon-web-services awk amazon-s3

m1 <- structure(c(3L, 4L, 4L, 4L, 5L, 2L, 6L, 12L, 16L, 19L, NA, 2L, 1L, 1L, 2L, 0L, 2L, 6L, 11L, 15L, NA, NA, 7L, 2L, 0L, 3L, 0L, 4L, 7L, 13L, NA, NA, NA, 3L, 0L, 3L, 0L, 3L, 6L, 9L, NA, NA, NA, NA, 5L, 5L, 3L, 1L, 5L, 7L, NA, NA, NA, NA, NA, 9L, 4L, 0L, 3L, 6L, NA, NA, NA, NA, NA, NA, 2L, 2L, 4L, 6L, NA, NA, NA, NA, NA, NA, NA, 6L, 0L, 3L, NA, NA, NA, NA, NA, NA, NA, NA, 3L, 3L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 5L), .Dim = c(10L, 10L), .Dimnames = list(NULL, NULL)) 要求您为每个字段提供分隔符,默认为空格字符,但对于Amazon S3的服务器日志,每个字段内部有时会有空格。例如,时间字段中有一个(例如:background: url('@Url.Content("~/Images/seizurerobots.gif")'); background-image: url('@Url.Content("~/Images/seizurerobots.gif")'); background: url("Images/seizurerobots.gif"); background: url("../Images/seizurerobots.gif"); 包含一个),也可能是关键字段内的一个。

http://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html

有没有办法在awk中解析这些,或者在将它们传递给awk之前我是否需要以某种方式对它们进行转换?

如果是这样,我怎么能这样做呢?我目前正在组合所有日志文件:

awk

然后运行awk。有没有办法在不破坏我的关键参数的情况下通过cat进行转换,但是仍然可以通过awk解析它?

编辑:我尝试使用FPAT使用gawk,但我没有得到与预期相同的结果。

[06/Feb/2014:00:00:38 +0000]

我的输出:

find . -type f -exec cat {} >> ../compiled.log \;

对我来说4美元是&#39; + 0000&#39;并且3美元缺少了剩余的日期,所以它似乎没有效果?

1 个答案:

答案 0 :(得分:1)

GNU awk with FPAT来救援:

awk 'BEGIN{ FPAT = "(\"[^\"]+\")|(\\[[^]]+\\])|([^ ]+)"} {
     for (i = 1; i <= NF; i++) printf "$%d = <%s>\n", i, $i}' s3.log

输出(在链接的ECS文档上提供示例日志)

$1 = <79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be>
$2 = <mybucket>
$3 = <[06/Feb/2014:00:00:38 +0000]>
$4 = <192.0.2.3>
$5 = <79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be>
$6 = <3E57427F3EXAMPLE>
$7 = <REST.GET.VERSIONING>
$8 = <->
$9 = <"GET /mybucket?versioning HTTP/1.1">
$10 = <200>
$11 = <->
$12 = <113>
$13 = <->
$14 = <7>
$15 = <->
$16 = <"-">
$17 = <"S3Console/0.4">
$18 = <->
...
...
...

Code Demo