Question

我想使用LWP :: UserAgent解析单个字符串；当我从网址解析字符串并将其保存在文件中时，实际上得到的是像

https://facebook.com/hello
http://google.com
https://facebook.com/hello
https://facebook.com/hello
http://google.com

有没有一种方法可以只在文件中打印单个字符串：

https://facebook.com/hello
http://google.com

及其超过1000个字符串，因此手动检查它是荒谬的

Answer 1

如果您要折叠重复项，则散列是该作业的工具。

#!/usr/bin/env perl

use strict;
use warnings;

my %seen; 

while ( <DATA> ) {
   print unless $seen{$_}++;
}

__DATA__
https://facebook.com/hello
http://google.com
https://facebook.com/hello
https://facebook.com/hello
http://google.com

迭代特殊的DATA文件句柄（例如内联），以便您使用打开的URL文件。然后测试以查看当前行是否已经在%seen哈希中-如果是，则跳过它。

虽然不进行任何排序-仅打印第一个实例。

Answer 2

哈希是在检索输入时检查重复项的最佳解决方案。如果您已经有一个充满字符串的数组，并且只希望每个字符串中的一个，请使用List::Util中的uniq：

use strict;
use warnings;
use List::Util 1.45 'uniq';
my @urls = qw(https://facebook.com/hello http://google.com https://facebook.com/hello https://facebook.com/hello http://google.com);
print "$_\n" foreach uniq @urls;

结果：

https://facebook.com/hello
http://google.com

如何不保存文件中已存在的已分析字符串

2 个答案: