我正在尝试编写一个perl脚本,以便从文本文件中提供的任意表格数据生成xml。为了便于讨论,我想从linux命令
获取输出 df -k
并将其解析为我的perl脚本并动态生成xml。
示例check_disk_usage.log
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda3 56776092 5431448 48413988 11% /
/dev/sda1 101086 18993 76874 20% /boot
tmpfs 2021888 0 2021888 0% /dev/shm
现在为了生成XML,我需要从这个表中提取标题并将它们存储在一个数组中供以后使用(它们将用作XML中的开始和结束标记) 我这样做的方式:
open my $file, '<', "$dir/check_disk_usage.log";
my $firstLine = <$file>;
close $file;
my (@header) = $firstLine =~ /(\S+)/g;
即我正在寻找所有一个或多个非空白模式(实际上是一个单词)并将它们保存在一个数组中。 只要标题名称遵循单个单词
的模式,这就可以正常工作 eg Filesystem,1K-blocks,Used etc
然而,当遇到标题名称s.a“Mounted on”时,它将断开,因为“Mounted”和“on”都将被视为不同的模式,因此将被存储为不同的数组元素。 有没有一种方法可以有效地识别/提取表格中的标题。
PS:我知道,我可以使用awk替换有问题的模式,然后解析文件。但是之后我需要事先知道“违规模式”,这是不可行的,因为我打算为任意表格数据编写这个脚本。
PSS:虽然我正在使用perl,但我也可以使用其他解决方案(例如php等)感谢您的帮助。
答案 0 :(得分:1)
从数据的外观来看,值是分开的,其中每行都有空格。如果某些行有空格而有些行没有,则它不是分隔符。这导致使用掩码来确定标题的分割位置。
有点难看,但是:
#!/usr/bin/perl
# Read the file provided on STDIN and then determine the delimiters,
# printing the individual elements per line.
my @lines = map { chomp; $_ } <>;
# The mask indicates if a character has ever been a NON whitespace character
my @mask = ();
foreach my $line (@lines) {
my @line = split //, $line;
foreach my $index (0..$#line) {
$mask[$index] ||= $line[$index] =~ /\S/;
}
}
# At this point the mask indicates where to split based on the zeros within it.
# Want to turn this into substr ranges.
# So 000011110000 would become 4, 4
my @substrings = (); # will contain [from, length]
my $last_transition = 0;
my $last_value = $mask[0];
# When it transitions from 0 to 1 or 1 to 0 the $last_transition is updated
# When the last value was a 1 it means it has stopped being a section and needs
# to be made into a split.
foreach my $index (1..$#mask) {
if ($mask[$index] != $last_value) {
if ($last_value) {
push @substrings, [$last_transition, ($index + 1 - $last_transition)];
}
$last_transition = $index;
$last_value = $mask[$index];
}
}
# Handle the end of the line, which is considered a transition to 0
if ( $last_value ) {
push @substrings, [$last_transition, ($#mask + 1 - $last_transition)];
}
# Just print them to show that it works, you would collect these instead.
foreach my $line (@lines) {
foreach my $split (@substrings) {
my $element = substr $line, $split->[0], $split->[1];
$element =~ s/(?:^\s+|\s+$)//;
print "$line -> $element\n";
}
}
输出:
Filesystem 1K-blocks Used Available Use% Mounted on -> Filesystem
Filesystem 1K-blocks Used Available Use% Mounted on -> 1K-blocks
Filesystem 1K-blocks Used Available Use% Mounted on -> Used
Filesystem 1K-blocks Used Available Use% Mounted on -> Available
Filesystem 1K-blocks Used Available Use% Mounted on -> Use%
Filesystem 1K-blocks Used Available Use% Mounted on -> Mounted on
/dev/sda3 56776092 5431448 48413988 11% / -> /dev/sda3
/dev/sda3 56776092 5431448 48413988 11% / -> 56776092
/dev/sda3 56776092 5431448 48413988 11% / -> 5431448
/dev/sda3 56776092 5431448 48413988 11% / -> 48413988
/dev/sda3 56776092 5431448 48413988 11% / -> 11%
/dev/sda3 56776092 5431448 48413988 11% / -> /
/dev/sda1 101086 18993 76874 20% /boot -> /dev/sda1
/dev/sda1 101086 18993 76874 20% /boot -> 101086
/dev/sda1 101086 18993 76874 20% /boot -> 18993
/dev/sda1 101086 18993 76874 20% /boot -> 76874
/dev/sda1 101086 18993 76874 20% /boot -> 20%
/dev/sda1 101086 18993 76874 20% /boot -> /boot
tmpfs 2021888 0 2021888 0% /dev/shm -> tmpfs
tmpfs 2021888 0 2021888 0% /dev/shm -> 2021888
tmpfs 2021888 0 2021888 0% /dev/shm -> 0
tmpfs 2021888 0 2021888 0% /dev/shm -> 2021888
tmpfs 2021888 0 2021888 0% /dev/shm -> 0%
tmpfs 2021888 0 2021888 0% /dev/shm -> /dev/shm
显然,您会将第一行处理为元素而不是将其打印出来。