Question

我需要从包含所有信息的文本文件中获取这些歌曲的标题。文本文件如下所示。

TRMMCAU128F9332597<SEP>SOEEWIZ12AB0182B09<SEP>YGGDRASIL<SEP>Beyond the Borders of Sanity
TRMMCCS12903CBEA4A<SEP>SOARHKB12AB0189EEA<SEP>Illegal Substance<SEP>Microphone Check

因此标题将是“超越理智的边界”和“麦克风检查”

我无法弄清楚如何删除之前的所有内容。这是我到目前为止的代码：

# Checks for the argument, fail if none given
if(songs.txt != 0) {
print STDERR "You must specify the file name as the argument.\n";
exit 4;
}

# Opens the file and assign it to handle INFILE
open(INFILE, 'songs.txt') or die "Cannot open songs.txt: $!.\n";

@data = <INFILE>;

my @lines = map {$_ =~ /^T/ ? ($_ => 1) : ()} @data;

# This loops through each line of the file
#while($line = <INFILE>) {

#chomp;
#   print $line;
#   print @data; 

#}

# Close the file handle
close INFILE; 
print @lines;

输出：

1TRMMCAU128F9332597<SEP>SOEEWIZ12AB0182B09<SEP>YGGDRASIL<SEP>Beyond the Borders of Sanity1

我意识到1不做我刚刚玩的任何东西。任何帮助是极大的赞赏。感谢。

Answer 1

使用split功能

@songs = map { chomp; (split /<SEP>/)[3] } @data;

假设<SEP>字面上在文件中，并且您想要第四个分隔字段，就像从示例数据中看到的一样。

Answer 2

您的数据看起来像Million Song Dataset中的数据，它使用文字<SEP>作为字段分隔符。要获得最后一个字段 - 歌曲的标题 - 您可以执行以下操作：

use strict;
use warnings;

@ARGV or die "You must specify the file name as the argument.\n";

while (<>) {
    print $1 if /([^>]+)$/;
}

用法：perl script.pl songs.txt [>outFile.txt]

最后一个可选参数将输出定向到文件。

数据集输出：

Beyond the Borders of Sanity
Microphone Check

正则表达式匹配行尾不是>的所有字符，并捕获它们。如果匹配成功，则会打印捕获（存储在$1中）。

希望这有帮助！

修改数组中的元素。删除数组中的元素部分。 Perl的

2 个答案: