我有一个非常具体的问题,我无法解决,它涉及解析和合并来自不同行的相关数据
我的文件包含以下格式的文字:
======================================================
8:27:24 PM http://10.11.12.13:80
======================================================
GET /dog-pictures HTTP/1.1
Host: 10.11.12.13
Language: english
Agent: Unknown
Connection: closed
======================================================
======================================================
8:28:56 PM http://192.114.126.245:80
======================================================
GET /flowers HTTP/1.1
Host: 10.11.12.13
Language: english
======================================================
======================================================
8:29:07 PM http://10.11.12.13:80
======================================================
GET /africas-animals HTTP/1.1
Host: 10.11.12.13
Language: english
Agent: Unknown
Connection: open
======================================================
如上所示,文本文件中的每个组数据由三行等号(=======)组成,但可以包含不同数量的行其中的数据。
我需要输出的格式如下:
http://10.11.12.13/dog-pictures
http://192.114.126.245/flowers
http://10.11.12.13/africas-animals
我需要合并的位的说明:
======================================================
8:27:24 PM http://10.11.12.13:80 <--- Gets the first part from here**
======================================================
GET /dog-pictures HTTP/1.1 <--- Gets the seconds part from here**
Host: 10.11.12.13
Language: english
Agent: Unknown
Connection: closed
======================================================
非常感谢您对此问题的帮助, 提前谢谢
答案 0 :(得分:1)
尝试在Perl
中的shell
中执行此操作:
perl -lane '
if (/^\d+:\d+:\d+\s+\w+\s+([^:]+):/) {
$scheme = $1;
}
if (/^(GET|HEAD|POST|PUT|DELETE|OPTION|TRACE)/) {
$path = $F[1];
}
if (/^Host/) {
print "$scheme://$F[1]$path";
}
' file.txt
perl -MO=Deparse
生成并略微调整......
#!/usr/bin/env perl
# mimic `-l` switch to print like "say"
BEGIN { $/ = "\n"; $\ = "\n"; }
use strict; use warnings;
my ($scheme, $path);
# magic diamond operator
while (<ARGV>) {
chomp $_;
# splitting current line in @F array
my (@F) = split(' ', $_, 0);
# regex to catch the scheme (http)
if (/^\d+:\d+:\d+\s+\w+\s+([^:]+):/) {
$scheme = $1;
}
# if the current line match an HTTP verb, we feed $path variable
# with second column
if (/^(GET|HEAD|POST|PUT|DELETE|OPTION|TRACE)/) {
$path = $F[1];
}
# if the current line match HOST, we print the needed line
if (/^Host/) {
print "${scheme}://$F[1]$path";
}
}
chmod +x script.pl
./script.pl file.txt
http://10.11.12.13/dog-pictures
http://10.11.12.13/flowers
http://10.11.12.13/africas-animals
答案 1 :(得分:1)
以下可能会对您有所帮助:
use strict;
use warnings;
open my $fh, '<', 'data.txt' or die $!;
# Read a file line
while (<$fh>) {
# If url captured on line beginning with time and read (separator) line
if ( my ($url) = /^\d+:\d+:\d+.+?(\S+):\d+$/ and <$fh> ) {
# Capture path
my ($path) = <$fh> =~ /\s+(\/\S+)\s+/;
print "$url$path\n" if $url and $path;
}
}
输出:
http://10.11.12.13/dog-pictures
http://192.114.126.245/flowers
http://10.11.12.13/africas-animals
只有两行包含您想要的信息,并且这些信号由等号线分隔。第一个正则表达式尝试匹配时间字符串并捕获该行上的URL。 and <$fh>
用于通过分隔符。第二个正则表达式捕获下一行的路径。最后,打印网址和路径。
答案 2 :(得分:0)
的Perl:
perl -F -lane 'if(/http/){$x=$F[2]}if(/GET/){print $x.$F[1]}' your_file
如果您想要使用awk:
awk '/http/{x=$3}/GET/{print x""substr($2,1)}' your_file