perl / Visual Basic中的简单文本处理任务

时间:2012-02-25 22:16:08

标签: vb.net perl

我有一个包含参考书目的长字符串,在paper-title /逗号分隔的作者列表行之间交替,如下所示:

Learning Programs: A Bayesian Approach
P. Liang, M. Jordan, D. Klein
Variational methods for a Dirichelet process
D. Blei, M. Jordan

我想要的是一个独特作者列表(按姓氏按字母顺序排列)和计数。在上面的例子中,它将是:

D. Blei (1)
M. Jordan (2)
D. Klein (1)
P. Liang (1)

有谁能告诉我如何在Perl或visual basic中执行此操作?非常感谢 - 你摇滚!

2 个答案:

答案 0 :(得分:1)

为我工作:

#!/usr/bin/perl
use strict;
use warnings;

### collecting all the authors, using them as hash slice keys for quick count
my %author_count;
while (<DATA>) {
  chomp( my $authors_line = <DATA> );
  $_++ for @author_count{split /, /, $authors_line};
}

### printing the resulting hash 
### sorting by substr was sufficient for test cases, 
### but may be replaced by regexers, of course. )
print "$_ ($author_count{$_})", "\n" 
  for sort { (substr $a, 3) cmp (substr $b, 3) } keys %author_count;

__DATA__
Learning Programs: A Bayesian Approach
P. Liang, M. Jordan, D. Klein
Variational methods for a Dirichelet process
D. Blei, M. Jordan

答案 1 :(得分:1)

在perl中,您需要做的是读取输入,使每一行都成为作者行:

my %list;
while (<DATA>) {
    chomp;
    my $book = $_;
    chomp(my $authors = <DATA>);
    map { push @{$list{$_}}, $book } split /,\s*/, $authors;
}

for (sort { sortA($a) cmp sortA($b) } keys %list) {
    printf "$_ (%s)\n", scalar @{$list{$_}};
}

sub sortA {
     if ($_[0] =~ / (\w+)/) {
        return $1;
    }
 }

__DATA__
Learning Programs: A Bayesian Approach
P. Liang, M. Jordan, D. Klein
Variational methods for a Dirichelet process
D. Blei, M. Jordan