我的经纪账户中有一堆NETFLIX订单。 我无意中在1/5和1/6上输入了两个重复的gtc销售订单。 如何使用Perl脚本检测它?
Buy NFLX 50 @ 315.00 Reg-Acct Fake
Buy NFLX 50 @ 317.50 Reg-Acct OPEN 01/13/15
Sell NFLX 50 @ 345.00 Reg-Acct OPEN 01/05/15
Sell NFLX 50 @ 345.00 Reg-Acct OPEN 01/06/15
Sell NFLX 50 @ 362.00 Reg-Acct OPEN 11/25/14
...
Sell NFLX 50 @ 345.00 IRA-Acct OPEN 09/15/14
我希望脚本只吐出这两行,
由fields[0]
通过fields[6]
判断为相同。
Sell NFLX 50 @ 345.00 Reg-Acct OPEN 01/05/15
Sell NFLX 50 @ 345.00 Reg-Acct OPEN 01/06/15
我更喜欢一个简单的脚本(即没有单行,没有哈希),因为我是Perl的新手。
谢谢, 拉里
答案 0 :(得分:1)
我知道你没说过任何一个班轮,但如果你的意思是没有 perl 单行:
sort filename|rev|uniq -D -f 1|rev
答案 1 :(得分:0)
我更喜欢简单的脚本(没有哈希)
唉。错过了无哈希。不幸的是,简单和没有哈希是相反的目标 - 更不用说没有哈希意味着效率不高,即< EM>慢。请参阅底部的代码,了解如何执行此操作。与此同时,您需要并行数组:
use strict;
use warnings;
use 5.016;
use Data::Dumper;
my @orders;
my @counts;
my $fname = 'data3.txt';
open my $ORDERSFILE, '<', $fname
or die "Couldn't open $fname: $!";
LINE:
while (my $line = <$ORDERSFILE>) {
my @pieces = split ' ', $line;
my $date = pop @pieces;
my $order = join ' ', @pieces;
if (not @orders) { #then length of @orders is 0
$orders[0] = $order;
$counts[0] = 1;
next LINE;
}
for my $i (0..$#orders) {
if ($orders[$i] eq $order) {
$counts[$i]++;
next LINE;
}
}
#If execution reaches here, then the order wasn't found in the array...
my $i = $#counts + 1;
$orders[$i] = $order;
$counts[$i] = 1
}
say Dumper(\@orders);
say Dumper(\@counts);
for my $i (0..$#counts) {
if ($counts[$i] > 1) {
say "($counts[$i]) $orders[$i]";
}
}
--output:--
$VAR1 = [
'Buy NFLX 50 @ 315.00 Reg-Acct',
'Buy NFLX 50 @ 317.50 Reg-Acct OPEN',
'Sell NFLX 50 @ 345.00 Reg-Acct OPEN',
'Sell NFLX 50 @ 362.00 Reg-Acct OPEN',
'Sell NFLX 50 @ 345.00 IRA-Acct OPEN'
];
$VAR1 = [
1,
1,
2,
1,
1
];
(2) Sell NFLX 50 @ 345.00 Reg-Acct OPEN
以下是一些更好的解决方案:
use strict;
use warnings;
use 5.016;
use Data::Dumper;
my %dates_for; #A key will be an order; a value will be a reference to an array of dates.
while (my $line = <DATA>) {
my @pieces = split ' ', $line;
my $date = pop @pieces;
my $order = join ' ', @pieces;
push @{$dates_for{$order}}, $date; #autovivification (see explanation below)
}
say Dumper(\%dates_for);
my @dates;
for my $order (keys %dates_for) {
@dates = @{$dates_for{$order}};
my $dup_count = @dates;
if ($dup_count > 1) {
say "($dup_count) $order";
say " $_" for @dates;
}
}
__DATA__
Buy NFLX 50 @ 315.00 Reg-Acct Fake
Buy NFLX 50 @ 317.50 Reg-Acct OPEN 01/13/15
Sell NFLX 50 @ 345.00 Reg-Acct OPEN 01/05/15
Sell NFLX 50 @ 345.00 Reg-Acct OPEN 01/06/15
Sell NFLX 50 @ 362.00 Reg-Acct OPEN 11/25/14
Sell NFLX 50 @ 345.00 IRA-Acct OPEN 09/15/14
--output:--
$VAR1 = {
'Sell NFLX 50 @ 345.00 IRA-Acct OPEN' => [
'09/15/14'
],
'Sell NFLX 50 @ 345.00 Reg-Acct OPEN' => [
'01/05/15',
'01/06/15'
],
'Buy NFLX 50 @ 317.50 Reg-Acct OPEN' => [
'01/13/15'
],
'Buy NFLX 50 @ 315.00 Reg-Acct' => [
'Fake'
],
'Sell NFLX 50 @ 362.00 Reg-Acct OPEN' => [
'11/25/14'
]
};
(2) Sell NFLX 50 @ 345.00 Reg-Acct OPEN
01/05/15
01/06/15
取消引用未定义的变量时,它会以静默方式升级 到数组或散列引用(取决于类型 解引用)。这种行为通常称为自动生成 你的意思是什么(例如,当你存储一个值时)....
http://search.cpan.org/~vpit/autovivification-0.14/lib/autovivification.pm
对于固定宽度列,使用unpack()更有效:
use strict;
use warnings;
use 5.016;
use Data::Dumper;
my $fname = 'data3.txt';
open my $ORDERSFILE, '<', $fname
or die "Couldn't open $fname: $!";
my %dates_for;
while (my $line = <$ORDERSFILE>) {
my ($order, $date) = unpack 'A41 @55 A*', $line; #see explanation below
push @{$dates_for{$order}}, $date;
}
close $ORDERSFILE;
say Dumper(\%dates_for);
my @dates;
for my $order (keys %dates_for) {
@dates = @{$dates_for{$order}};
if (@dates > 1) {
my $dup_count = @dates;
say "($dup_count) $order";
say " $_" for @dates;
}
}
--output:--
$VAR1 = {
' Buy NFLX 50 @ 317.50 Reg-Acct OPEN' => [
'01/13/15'
],
'Sell NFLX 50 @ 362.00 Reg-Acct OPEN' => [
'11/25/14'
],
'Sell NFLX 50 @ 345.00 Reg-Acct OPEN' => [
'01/05/15',
'01/06/15'
],
' Buy NFLX 50 @ 315.00 Reg-Acct Fake' => [
''
],
'Sell NFLX 50 @ 345.00 IRA-Acct OPEN' => [
'09/15/14'
]
};
(2) Sell NFLX 50 @ 345.00 Reg-Acct OPEN
01/05/15
01/06/15
A41 @55 A*
=&gt;提取41个字符(A),
..............................跳到55号位置(@ 55),
..............................提取剩余的字符(A *)
您可以跳到任何您想要的位置,前进和后退,这意味着您可以按照您想要的任何顺序提取作品。