我有一个文本文件,其中包含我需要更改为标题案例的标题列表(除了大多数文章,连词和介词之外,单词应以大写字母开头)。
例如,这个书名列表:
barbarians at the gate
hot, flat, and crowded
A DAY LATE AND A DOLLAR SHORT
THE HITCHHIKER'S GUIDE TO THE GALAXY
应更改为:
Barbarians at the Gate
Hot, Flat, and Crowded
A Day Late and a Dollar Short
The Hitchhiker's Guide to the Galaxy
我写了以下代码:
while(<DATA>)
{
$_=~s/(\s+)([a-z])/$1.uc($2)/eg;
print $_;
}
但它将每个单词的第一个字母大写,即使是标题中间的“at”,“the”和“a”等字样:
Barbarians At The Gate
Hot, Flat, And Crowded
A Day Late And A Dollar Short
The Hitchhiker's Guide To The Galaxy
我该怎么做?
答案 0 :(得分:4)
Thanks to See also Lingua::EN::Titlecase – Håkon Hægland given the way to get the output.
use Lingua::EN::Titlecase;
my $tc = Lingua::EN::Titlecase->new();
while(<DATA>)
{
my $line = $_;
my $tc = Lingua::EN::Titlecase->new($line);
print $tc;
}
答案 1 :(得分:0)
You can also try using this regex: ^(.)(.*?)\b|\b(at|to|that|and|this|the|a|is|was)\b|\b(\w)([\w']*?(?:[^\w'-]|$))
and replace with \U$1\L$2\U$3\L$4
. It works my matching the first letter of words that are not articles, capitalizing it, then matching the rest of the word. This seems to work in PHP, I don't know about Perl but it will likely work.
^(.)(.*?)\b
matches the first letter of the first word (group 1) and the rest of the word (group 2). This is done to prevent not capitalizing the first word because it's an article.\b(word|multiple words|...)\b
matches any connecting word to prevent capitalizing them.(\w)([\w']*?(?:[^\w'-]|$))
matches the first letter of a word (group 3) and the rest of the word (group 4). Here I used [^\w'-]
instead of \b
so hyphens and apostrophes are counted as word characters too. This prevent 's
from becoming 'S
The \U
in replacement capitalizes the following characters and \L
lowers them. If you want you can add more articles or words to the regex to prevent capitalizing them.
UPDATE: I changed the regex so you can include connecting phrases too (multiple words). But that will still make a very long regex...