Question

我有点难过这个。我正在尝试解析具有如下数据的文件：

"1111 Some random descriptive text can have numbers and letters",
// :property1.Some description
// :property2.A different description
// :property3.Yet another
"2222 More random text here",
// :property1.Some description
// :property1.A different description
// :property2.Yet another description
// :property3.Yet another

我要解析这个并创建html文件。

我目前在做完之后将它放在数组中：

@array = <FILE>;

#Put it in a single long string:
$long_string = join("",@array);

#Then trying to split it with the following regex:
@split_array = split(/\"\d{4}.+",/,$long_string);

我打算以某种方式保存匹配字符串，并以某种方式将其与属性字段关联起来......

现在真的怀疑我的方法..

Answer 1

解析文本时，您需要识别关键杠杆点，以帮助您将一条信息与另一条信息区分开来。这是我在你的文字中看到的：

每一行都是一个独特的单位。
有些行以//开头，有些则不以。{/ p>
在行的开头有一些规律性，但其余部分有很多变化。

通过将文档篡改并加入单个字符串，您正在削弱这些杠杆点。

另一个关键的解析策略是将事情分解为简单易懂的步骤。在这里，run-one-regex-against-a-giant-string策略通常是错误的方向。

这就是我的开始：

use strict;
use warnings;

open(my $file_handle, '<', 'input_file_name') or die $!;

while (my $line = <$file_handle>){
    if ( $line =~ /^\"(\d+)/ ){
        my $number = $1;
        ...
    }
    else {
        ...
    }
}

Perl解析具有不断变化的字段大小的多行文件

1 个答案: