m1 <- structure(c(3L, 4L, 4L, 4L, 5L, 2L, 6L, 12L, 16L, 19L, NA, 2L,
1L, 1L, 2L, 0L, 2L, 6L, 11L, 15L, NA, NA, 7L, 2L, 0L, 3L, 0L,
4L, 7L, 13L, NA, NA, NA, 3L, 0L, 3L, 0L, 3L, 6L, 9L, NA, NA,
NA, NA, 5L, 5L, 3L, 1L, 5L, 7L, NA, NA, NA, NA, NA, 9L, 4L, 0L,
3L, 6L, NA, NA, NA, NA, NA, NA, 2L, 2L, 4L, 6L, NA, NA, NA, NA,
NA, NA, NA, 6L, 0L, 3L, NA, NA, NA, NA, NA, NA, NA, NA, 3L, 3L,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 5L), .Dim = c(10L, 10L),
.Dimnames = list(NULL, NULL))
要求您为每个字段提供分隔符,默认为空格字符,但对于Amazon S3的服务器日志,每个字段内部有时会有空格。例如,时间字段中有一个(例如:background: url('@Url.Content("~/Images/seizurerobots.gif")');
background-image: url('@Url.Content("~/Images/seizurerobots.gif")');
background: url("Images/seizurerobots.gif");
background: url("../Images/seizurerobots.gif");
包含一个),也可能是关键字段内的一个。
http://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html
有没有办法在awk中解析这些,或者在将它们传递给awk之前我是否需要以某种方式对它们进行转换?
如果是这样,我怎么能这样做呢?我目前正在组合所有日志文件:
awk
然后运行awk。有没有办法在不破坏我的关键参数的情况下通过cat进行转换,但是仍然可以通过awk解析它?
编辑:我尝试使用FPAT使用gawk,但我没有得到与预期相同的结果。 [06/Feb/2014:00:00:38 +0000]
我的输出:
find . -type f -exec cat {} >> ../compiled.log \;
对我来说4美元是&#39; + 0000&#39;并且3美元缺少了剩余的日期,所以它似乎没有效果?
答案 0 :(得分:1)
awk 'BEGIN{ FPAT = "(\"[^\"]+\")|(\\[[^]]+\\])|([^ ]+)"} {
for (i = 1; i <= NF; i++) printf "$%d = <%s>\n", i, $i}' s3.log
输出(在链接的ECS文档上提供示例日志)
$1 = <79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be>
$2 = <mybucket>
$3 = <[06/Feb/2014:00:00:38 +0000]>
$4 = <192.0.2.3>
$5 = <79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be>
$6 = <3E57427F3EXAMPLE>
$7 = <REST.GET.VERSIONING>
$8 = <->
$9 = <"GET /mybucket?versioning HTTP/1.1">
$10 = <200>
$11 = <->
$12 = <113>
$13 = <->
$14 = <7>
$15 = <->
$16 = <"-">
$17 = <"S3Console/0.4">
$18 = <->
...
...
...