需要更新Perl脚本...不熟悉Perl

时间:2011-07-13 17:23:11

标签: xml perl

好的......我们在Exchange中有一个联系人书籍,它被导出到一个XML文件中......我们的内部网...用于我们的Associate Directory。发生了“事情”,导致一系列导致XML更新的事件。

显然,我们的Squirrel Mail服务器使用Perl脚本将此XML转换为global.abook。

我不熟悉Perl,但通用的想法似乎很容易理解:遍历XML,为每个人拉“昵称”,全名,电子邮件&标题并加入global.abook。

我确定OLD XML文件没有Root \ XSD:Schema和Root \ DataRoot布局。不确定更新的最佳格式是什么。

Perl脚本:

#!/usr/bin/perl
use strict;

use XML::Parser;
use Data::Dumper;

my $url = 'http://intranet.mycompany.org/directory/directory.xml';
my $output = '/var/lib/squirrelmail/prefs/global.gabook';

my $file = "curl -sS '$url' |";
my $parser = new XML::Parser(Style => 'Tree');
my $tree = $parser->parsefile($file)->[1];

sub extract {
        my ($string, $record) = @_;
        for (my $i = 0; $i < @{$record}.''; $i++) {
                if ($record->[$i] eq $string) {
                        return $record->[$i + 1][2];
                }
        }
        return undef;
}

open FILE, "> $output"
        or die "Couldn't open: $!";
for (my $i = 4; $i < @{$tree}.''; $i += 4) {
        my $record = $tree->[$i];
        my $full = &extract('DisplayName', $record);
        my $title = &extract('JobTitle', $record);
        my $email = &extract('EMailDisplayName', $record);
        next unless($email);
        my $nickname;
        # Nickname is the first part of the email address
        if ($email =~ /^(\w+)\@/) {
                $nickname = $1;
        }
        print FILE "$nickname|$full||$email|$title" . "\n";
}
close FILE

XML文件:

<?xml version="1.0" standalone="yes"?>
<root xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:od="urn:schemas-microsoft-com:officedata">
  <xsd:schema>
  ...
  </xsd:schema>
  <dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" generated="2011-07-12T14:14:13">
    <ROW>
      <DisplayName>John Doe</DisplayName>
      <FirstName>John</FirstName>
      <LastName>Doe</LastName>
      <JobTitle>I.D. 10 Technologist</JobTitle>
      <Company>My Company</Company>
      <Department>Administration</Department>
      <FileAs>Doe, John</FileAs>
      <BusinessPhone>(800) 867-5309</BusinessPhone>
      <EMailAddress>jdoe@mycompany.org</EMailAddress>
      <EMailAddressType>SMTP</EMailAddressType>
      <EMailDisplayName>jdoe@mycompany.org</EMailDisplayName>
      <Initials>J.D.</Initials>
      <Private>0</Private>
    </ROW>
    <ROW>
      ...
    </ROW>
  </dataroot>
</root>

所需的文本文件:

jdoe|John Doe||jdoe@atlanticgeneral.org|I.D. 10 Technician
...
...

2 个答案:

答案 0 :(得分:3)

XML :: Parser相当神秘。我使用XML :: LibXML。

#!/usr/bin/perl
use strict;

use XML::LibXML               qw( );
use XML::LibXML::XPathContext qw( );

my $xml = <<'__EOI__';
<?xml version="1.0" standalone="yes"?>
<root xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:od="urn:schemas-microsoft-com:officedata">
  <xsd:schema>
  ...
  </xsd:schema>
  <dataroot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" generated="2011-07-12T14:14:13">
    <ROW>
      <DisplayName>John Doe</DisplayName>
      <FirstName>John</FirstName>
      <LastName>Doe</LastName>
      <JobTitle>I.D. 10 Technologist</JobTitle>
      <Company>My Company</Company>
      <Department>Administration</Department>
      <FileAs>Doe, John</FileAs>
      <BusinessPhone>(800) 867-5309</BusinessPhone>
      <EMailAddress>jdoe@mycompany.org</EMailAddress>
      <EMailAddressType>SMTP</EMailAddressType>
      <EMailDisplayName>jdoe@mycompany.org</EMailDisplayName>
      <Initials>J.D.</Initials>
      <Private>0</Private>
    </ROW>
  </dataroot>
</root>
__EOI__

sub get_text { map $_->textContent, @_ }

my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml);
my $root = $doc->documentElement();

for my $row ($root->findnodes('/root/dataroot/ROW')) {
   my ($name)  = get_text( $row->findnodes('DisplayName') );
   my ($title) = get_text( $row->findnodes('JobTitle') );
   my ($email) = get_text( $row->findnodes('EMailDisplayName') );

   if (!defined($name) || !defined($title) || !defined($email)) {
      warn("Bad record\n");
      next;
   }

   my ($nick) = $email =~ /^([^@]*)/;

   print("$nick|$name||$email|$title\n");
}

答案 1 :(得分:2)

这是你在找什么?

use strict;
use warnings;
use XML::Simple;
use LWP::Simple;


my $url = 'http://intranet.mycompany.org/directory/directory.xml';
my $outfile = '/var/lib/squirrelmail/prefs/global.gabook';


my $xml = get( $url );
my $structure = XMLin( $xml );

open my $out_fh, '>', $outfile or die $!;
foreach my $row ( @{ $structure->{dataroot}{ROW} } ) {
    next unless exists $row->{FileAs} and defined $row->{FileAs};
    my( $email, $name, $title ) = map{
        warn "Warning: $_ is undefined for $row->{FileAs}."
            unless exists $row->{$_} and defined $row->{$_};
        $row->{$_} || '';
    } qw/ EMailAddress DisplayName JobTitle /;
    my $nick;
    if( $email =~ m/^([^@]+)@/ ) {
        $nick = $1;
    } else {
        $nick = '';
        warn "Warning: No nickname for $row->{FileAs}.";
    }
    print $out_fh "$nick|$name||$email|$title\n";
}

close $out_fh or die $!;

如果您的XML不是非常复杂,那么XML :: Simple是一个简单的解决方案。另外,当你可以在Perl中使用LWP :: Simple时,我并不认为需要从shell中使用curl。如果您愿意,您可以轻松修改上述内容,使其与原始脚本的依赖关系更加接近。我对LWP :: Simple的使用可以由您的curl替代。

在特定字段不包含任何内容或不存在的情况下,我添加了屏幕警告和默认行为。例如,如果给定行缺少EMailAddress,则会收到一些警告。但是,默认的空字符串将插入到该列位置以便进行正常恢复。如果您认为此问题非常严重,可以将warn更改为die

我也在跳过任何没有定义FileAs标记的ROW,假设至少有一个标记必须存在才能使记录有效。你可以改变它的味道,但如果它不是一个有效的记录代码,我会保持某种形式的优雅'继续前进。以防万一。