使用perl查找MS-Word文档的字体属性

时间:2013-03-21 12:25:40

标签: perl

我需要做的是...读取一个word文件,根据字体的属性在它们前面添加一个标记,将其区分为标题或段落 但是,我需要使用Perl来做到这一点.. 可能吗??? 任何帮助将不胜感激。 谢谢!

3 个答案:

答案 0 :(得分:4)

@Nikita,这将为您提供有关其完成方式的详细信息:

#!/usr/bin/perl
use strict;
use warnings;
use Win32::OLE::Const 'Microsoft Word';
#$Win32::OLE::CP = CP_UTF8;
binmode STDOUT, 'encoding(utf8)';

# OPEN FILE SPECIFIED AS FIRST ARGUMENT
my $fname=$ARGV[0]; 
my $fnameFullPath = `cygpath.exe -wa $fname`;
$fnameFullPath =~ s/\\/\\\\/g;
$fnameFullPath =~ s/\s*$//;
unless (-e $fnameFullPath) { print "Error: File did not exists\n"; exit 1;}

# STARTING OLE
my $Word = Win32::OLE->GetActiveObject('Word.Application')
    || Win32::OLE->new('Word.Application','Quit')
    or die Win32::OLE->LastError();

$Word->{'Visible'} = 0;
my $doc = $Word->Documents->Open($fnameFullPath);
my $paragraphs = $doc->Paragraphs() ;
my $enumerate = new Win32::OLE::Enum($paragraphs);

# PROCESSING PARAGRAPHS
while(defined(my $paragraph = $enumerate->Next())) {

    my $text = $paragraph->{Range}->{Text};
    my $sel = $Word->Selection;
    my $font = $sel->Font;

    if ($font->{Size} == 18){
        print "Text: ", $text, "\n";
        print "Font Bold: ", $font->{Bold}, "\n";
        print "Font Italic: ", $font->{Italic}, "\n";
        print "Font Name: ", $font->{Name}, "\n";
        print "Font Size: ", $font->{Size}, "\n";
        print "=========\n";
    }
}

# CLOSING OLE
$Word->ActiveDocument->Close ;
$Word->Quit;

输出结果如下:

Text: This is a doc file containing different fonts and size, document also contain header and footer (Font: TNR, Size: 18)
Font Bold: 0
Font Italic: 0
Font Name: Times New Roman
Font Size: 18
=========
Text: This is a Perl example (Font TNR, Size: 12)
Font Bold: 0
Font Italic: 0
Font Name: Times New Roman
Font Size: 18
=========
Text: This is a Python example(Font: Courier New, Size: 10)
Font Bold: 0
Font Italic: 0
Font Name: Times New Roman
Font Size: 18
=========

答案 1 :(得分:2)

我需要更多信息来帮助您识别需要处理的字词。在我的示例中,我只是搜索文本一些this is my *.docx file

#!/usr/bin/perl

use Modern::Perl;
use Win32::OLE;

use Win32::OLE qw(in with);
use Win32::OLE::Variant;
use Win32::OLE::Const 'Microsoft Word';
$Win32::OLE::Warn = 3;

print "Starting Word\n";

    my $Word = Win32::OLE->GetActiveObject('Word.Application') ||
           Win32::OLE->new('Word.Application');
    $Word->{'Visible'}     = 1;
    $Word->{DisplayAlerts} = 0;

my $File = $Word->Documents->Open( "./fonts.docx" ) or die Win32::OLE->LastError;

$Word->Selection->HomeKey(wdStory);

$Word->Selection->Find->{'Text'} = 'Some';

$Word->Selection->Find->Execute();

say "Font size: [", $Word->Selection->Font->Size(), "]";
say "Font name: [", $Word->Selection->Font->Name(), "]";

$Word->Quit;

答案 2 :(得分:0)

尝试使用OLE自动化,Win32::OLE模块很有帮助。 这种方式需要更深入的Word OLE API知识。