美好的一天。
我的文字文件内容如下。 tmp.txt(一个非常大的文件)
constant fixup private AlarmFileName = <A "C:\\TMP\\ALARM.LOG"> /* A Format */
constant fixup ConfigAlarms = <U1 0> /* U1 Format */
constant fixup ConfigEvents = <U2 0> /* U2 Format */
我的解析代码如下。
代码无法在此处理C:\\TMP\\ALARM.LOG
(带引号的字符串)。
我不知道如何替换代码“s +([a-zA-Z0-9])+&gt;”处理[a-zA-Z0-9](上面的0)字符串和quated字符串(上面的“C:\ TMP \ ALARM.LOG”)。
$source_file = "tmp.txt";
$dest_xml_file = "my.xml";
#Check existance of root directory
open(SOURCE_FILE, "$source_file") || die "Fail to open file $source_file";
open(DEST_XML_FILE, ">$dest_xml_file") || die "Coult not open output file $dest_xml_file";
$x = 0;
print DEST_XML_FILE "<!-- from tmp.txt-->\n";
while (<SOURCE_FILE>)
{
&ConstantParseAndPrint;
}
sub ConstantParseAndPrint
{
if ($x == 0)
{
if(/^\s*(constant)\s*(fixup|\/\*fixup\*\/|)\s*(private|)\s*(\w+)\s+=\s+<([a-zA-Z0-9]+)\s+([a-zA-Z0-9])+>\s*(\/\*\s*(.*?)\s*\*\/|)(\r|\n|\s)/)
{
$name1 = $1;
$name2 = $2;
$name3 = $3;
$name4 = $4;
$name5 = $5;
$name6 = $6;
$name7 = $7;
printf DEST_XML_FILE "\t\t$name1";
printf DEST_XML_FILE "\t\t$name2";
printf DEST_XML_FILE "\t\t$name3";
printf DEST_XML_FILE "\t\t$name4";
printf DEST_XML_FILE "\t\t$name5";
printf DEST_XML_FILE "\t\t$name6";
printf DEST_XML_FILE "\t\t$name7";
$x = 1;
}
}
}
感谢您的投入。
** HELLO ALL,
感谢您提供了许多出色的解决方案。我是一个新人,我想根据你的帖子做更多的研究。
感谢很多。**
答案 0 :(得分:10)
我不打算为你写你的正则表达式,或者给你一些剪切和粘贴到你的代码中的东西。按照你的正则表达式的速度,无论如何它将在下一个特殊案例中打破。我会给你的是一个更好的方法。
将每一行拆分为作业的右侧和左侧。
my($lhs, $rhs) = split m{\s* = \s*}x, $line, 2;
现在单独使用它们要容易得多。您只需在空白处拆分信息即可从左侧提取信息以获取所有标志(常量,修正等...),最后一个字将是分配给的名称。
my @flags = split /\s+/, $lhs;
my $name = pop @flags;
然后,如果需要,您可以通过@flags过滤行。
可以很容易地得到大概在括号内的值。使用非贪婪的正则表达式可确保它正确处理foo = <bar> /* comment <stuff> */
。
my($value) = $rhs =~ /<(.*?)>/;
正如您从这种方法中看到的那样,它避免了必须猜测文件中可能出现的特殊关键字(常量,修正,私有)。
我不知道这个文件中还有什么,你没说。
答案 1 :(得分:3)
您的代码中存在一些主要的设计缺陷。我没有解决你的问题,但我已经清理了你的代码。
最重要的是,不要使用全局变量。在相对较短的代码块中,您使用3个全局变量。这是一个无法追查的神秘错误。随着您的项目随着时间的推移变大,这将成为一个更大的问题。
使用Perl::Critic。它将帮助您改进代码。
以下是代码的带注释,已清理的版本:
# Always use strict and warnings.
# It prevents bugs.
use strict;
use warnings;
my $source_file = "tmp.txt";
my $dest_xml_file = "my.xml";
# You aren't checking the existence of anyting here:
#Check existance of root directory
# Is this a TODO item?
# Use 3 argument open with a lexical filehandle.
# Adding $! to your error messages makes them more useful.
open my $source_fh, '<', $source_file
or die "Fail to open file $source_file - $!";
open( my $dest_fh, '>', $dest_xml_file
or die "Coult not open output file $dest_xml_file - $!";
my $x = 0; # What the heck does this do? Give it a meaningful name or
# delete it.
print $dest_fh "<!-- from tmp.txt-->\n";
while (my $line = <$source_fh>)
{
# Don't use global variables.
# Explicitly pass all data your sub needs.
# Any values that need to be applied to external
# data should be applied by the calling function,
# from data that is returned.
$x = ConstantParseAndPrint( $line, $x, $dest_fh );
}
sub ConstantParseAndPrint {
my $line = shift;
my $mystery_value = shift;
my $fh = shift;
if($mystery_value == 0) {
# qr{} is a handy way to build a regex.
# using {} instead of // to mark the boundaries helps
# cut down on the escaping required when your pattern
# contains the '/' character.
# Use the x regex modifier to allow whitespace and
# comments in your regex.
# This very is important when you can't avoid using a big, complex regex.
# But really don't do it this way at all.
# Do what Schwern says.
my $line_match = qr{
^ \s* # Skip leading spaces
(constant) \s* # look for the constant keyword
(fixup|/\*fixup\*/|) \s* # look for the fixup keyword
(private|) \s* # look for the prive keyword
(\w+) \s+ # Get parameter name
= \s+
< # get bracketed values
([a-zA-Z0-9]+) \s+ # First value
([a-zA-Z0-9])+ # Second value
> \s*
(/\*\s*(.*?)\s*\*/|) # Find any trailing comment
(\r|\n|\s) # Trailing whitespace
}x;
if( $line =~ /$line_match/ ) {
# Any time you find yourself making variables
# with names like $foo1, $foo2, etc, use an array.
my @names = ( $1, $2, $3, $4, $5, $6, $7 );
# printf is for printing formatted data.
# If you aren't using any format codes, use print.
# Using an array makes it easy to print all the tokens.
print $fh "\t\t$_" for @names;
$mystery_value = 1;
}
}
return $mystery_value;
}
关于你的解析问题,请遵循Schwern的建议。大而复杂的正则表达式是您需要简化的标志。将重大问题分解为可管理的任务。
答案 2 :(得分:2)
答案 3 :(得分:1)
#!/usr/bin/perl
$source_file = "tmp.txt";
$dest_xml_file = "my.xml";
#Check existance of root directory
open(SOURCE_FILE, "$source_file") || die "Fail to open file $source_file";
open(DEST_XML_FILE, ">$dest_xml_file") || die "Coult not open output file $dest_xml_file";
$x = 0;
print DEST_CS_FILE "<!-- from tmp.txt-->\n";
while (<SOURCE_FILE>)
{
&ConstantParseAndPrint;
}
sub ConstantParseAndPrint
{
if ($x == 0)
{
# if(/^\s*(constant)\s*(fixup|\/\*fixup\*\/|)\s*(private|)\s*(\w+)\s+=\s+<([a-zA-Z0-9]+)\s+([a-zA-Z0-9])+>\s*(\/\*\s*(.*?)\s*\*\/|)(\r|\n|\s)/)
if(/^\s*(constant)\s*(fixup|\/\*fixup\*\/|)\s*(private|)\s*(\w+)\s+=\s+<([a-zA-Z0-9]+)\s+(["']?)([a-zA-Z0-9.:\\]+)\6>\s*(\/\*\s*(.*?)\s*\*\/|)(\r|\n|\s)/)
{
$name1 = $1;
$name2 = $2;
$name3 = $3;
$name4 = $4;
$name5 = $5;
$name6 = $7;
$name7 = $8;
printf DEST_XML_FILE "\t\t$name1";
printf DEST_XML_FILE "\t\t$name2";
printf DEST_XML_FILE "\t\t$name3";
printf DEST_XML_FILE "\t\t$name4";
printf DEST_XML_FILE "\t\t$name5";
printf DEST_XML_FILE "\t\t$name6";
printf DEST_XML_FILE "\t\t$name7\n";
# $x = 1;
}
}
}
使用以下解析代码:
if(/^\s*(constant)\s*(fixup|\/\*fixup\*\/|)\s*(private|)\s*(\w+)\s+=\s+<([a-zA-Z0-9]+)\s+(["']?)([a-zA-Z0-9.:\\]+)\6>\s*(\/\*\s*(.*?)\s*\*\/|)(\r|\n|\s)/)
我添加了单引号和双引号的处理。我使用反向引用来匹配引号。我还更新了路径的字符类。即它现在包括冒号(:),点(。)和反斜杠()以及字母数字字符。
答案 4 :(得分:1)
我故意删除了匹配捕获(如果需要,可以添加它们):
m{^\s*constant\s+fixup\s+(?:private\s+)?\w+\s*=\s*<[^>]+>(?:\s*/\*(?:\s*\w*)+\*/)?$};
答案 5 :(得分:1)
首先统一!
$yourstring =~ s,\\,/,g; # transform '\' into '/'
$yourstring =~ s,/+,/,g; # transform multiple '/' into one '/'