我需要做的是...读取一个word文件,根据字体的属性在它们前面添加一个标记,将其区分为标题或段落 但是,我需要使用Perl来做到这一点.. 可能吗??? 任何帮助将不胜感激。 谢谢!
答案 0 :(得分:4)
@Nikita,这将为您提供有关其完成方式的详细信息:
#!/usr/bin/perl
use strict;
use warnings;
use Win32::OLE::Const 'Microsoft Word';
#$Win32::OLE::CP = CP_UTF8;
binmode STDOUT, 'encoding(utf8)';
# OPEN FILE SPECIFIED AS FIRST ARGUMENT
my $fname=$ARGV[0];
my $fnameFullPath = `cygpath.exe -wa $fname`;
$fnameFullPath =~ s/\\/\\\\/g;
$fnameFullPath =~ s/\s*$//;
unless (-e $fnameFullPath) { print "Error: File did not exists\n"; exit 1;}
# STARTING OLE
my $Word = Win32::OLE->GetActiveObject('Word.Application')
|| Win32::OLE->new('Word.Application','Quit')
or die Win32::OLE->LastError();
$Word->{'Visible'} = 0;
my $doc = $Word->Documents->Open($fnameFullPath);
my $paragraphs = $doc->Paragraphs() ;
my $enumerate = new Win32::OLE::Enum($paragraphs);
# PROCESSING PARAGRAPHS
while(defined(my $paragraph = $enumerate->Next())) {
my $text = $paragraph->{Range}->{Text};
my $sel = $Word->Selection;
my $font = $sel->Font;
if ($font->{Size} == 18){
print "Text: ", $text, "\n";
print "Font Bold: ", $font->{Bold}, "\n";
print "Font Italic: ", $font->{Italic}, "\n";
print "Font Name: ", $font->{Name}, "\n";
print "Font Size: ", $font->{Size}, "\n";
print "=========\n";
}
}
# CLOSING OLE
$Word->ActiveDocument->Close ;
$Word->Quit;
输出结果如下:
Text: This is a doc file containing different fonts and size, document also contain header and footer (Font: TNR, Size: 18) Font Bold: 0 Font Italic: 0 Font Name: Times New Roman Font Size: 18 ========= Text: This is a Perl example (Font TNR, Size: 12) Font Bold: 0 Font Italic: 0 Font Name: Times New Roman Font Size: 18 ========= Text: This is a Python example(Font: Courier New, Size: 10) Font Bold: 0 Font Italic: 0 Font Name: Times New Roman Font Size: 18 =========
答案 1 :(得分:2)
我需要更多信息来帮助您识别需要处理的字词。在我的示例中,我只是搜索文本一些(this is my *.docx file)
#!/usr/bin/perl
use Modern::Perl;
use Win32::OLE;
use Win32::OLE qw(in with);
use Win32::OLE::Variant;
use Win32::OLE::Const 'Microsoft Word';
$Win32::OLE::Warn = 3;
print "Starting Word\n";
my $Word = Win32::OLE->GetActiveObject('Word.Application') ||
Win32::OLE->new('Word.Application');
$Word->{'Visible'} = 1;
$Word->{DisplayAlerts} = 0;
my $File = $Word->Documents->Open( "./fonts.docx" ) or die Win32::OLE->LastError;
$Word->Selection->HomeKey(wdStory);
$Word->Selection->Find->{'Text'} = 'Some';
$Word->Selection->Find->Execute();
say "Font size: [", $Word->Selection->Font->Size(), "]";
say "Font name: [", $Word->Selection->Font->Name(), "]";
$Word->Quit;
答案 2 :(得分:0)
尝试使用OLE自动化,Win32::OLE模块很有帮助。 这种方式需要更深入的Word OLE API知识。