我试图在使用Perl的句点和空格之后将所有出现的小写字母大写。这是一个输入的例子:
...so, that's our art. the 4 of us can now have a dialog. we can have a conversation. we can speak to...
这是我想看到的输出:
...so, that's our art. The 4 of us can now have a dialog. We can have a conversation. We can speak to...
我尝试了多个正则表达式但没有取得多大成功 - 例如:
$currentLine =~ s/\.\s([a-z])/\. \u$1/g;
或
$currentLine =~ s/([\.!?]\s*)(\w)/$1\U$2/g;
但我没有得到预期的结果。求救!
正如有人所指出的那样,提供背景的问题可能在于其他地方。正则表达式在这个小脚本的上下文中使用,它除了发布这篇文章的步骤之外还做了一些事情。我在从视频隐藏字幕获得的长SRT文件上运行它。再次感谢您的帮助。
#! perl
use strict;
use warnings;
my $filename = $ARGV[0];
open(INPUT_FILE, $filename)
or die "Couldn't open $filename for reading!";
while (<INPUT_FILE>) {
my $currentLine = $_;
# Remove empty lines and lines that start with digits
if ($currentLine =~ /^[\s+|\d+]/){
next;
}
# Remove all carriage returns
$currentLine =~ s/\R$/ /;
# Convert all letters to lower case
$currentLine =~ s/([A-Z])/\l$1/g;
# Capitalize after period <= STEP THAT DOES NOT WORK
$currentLine =~ s/\.\s([a-z])/\. \u$1/g;
print $currentLine;
}
close(INPUT_FILE);
答案 0 :(得分:3)
试试这个
使用look look并捕获模式并使用\U
将字符串的开头更改为大写
$str ="...so, that's our art. the 4 of us can now have a dialog. we can have a conversation. we can speak to...";
$str =~ s/(?<=\w\.\s)(\w)/\U$1/g;
print $str
或者尝试\K
通过替换保留单词。
$str =~ s/\w\.\s\K(\w)/\U$1/g;
答案 1 :(得分:0)
一个问题是代码:
if ($currentLine =~ /^[\s+|\d+]/){
next;
}
与注释相反,这会忽略以空格,数字,加号或管道符号开头的行。这可能会让你走错了路。你可能想写:
next if /^(\s+$|\d)/;
如果整行是空格,或者第一个字符是数字,则跳过一行。
您可以使用以下内容简化循环并对其进行概括:
#!/usr/bin/env perl
use strict;
use warnings;
while (<>) {
# Remove empty lines and lines that start with digits. sometimes
next if /^(\s+$|\d)/;
# Remove all carriage returns. forever
s/\R$//;
# Convert all letters to lower case. always
s/([A-Z])/\l$1/g;
# Capitalize after period <=... STEP THAT DOES NOT WORK
s/\.\s([a-z])/\. \u$1/g;
print "$_\n";
}
当自身运行时,输出为:
#!/usr/bin/env perl
use strict;
use warnings;
while (<>) {
# remove empty lines and lines that start with digits. Sometimes
next if /^(\s+$|\d)/;
# remove all carriage returns. Forever
s/\r$//;
# convert all letters to lower case. Always
s/([a-z])/\l$1/g;
# capitalize after period <=... Step that does not work
s/\.\s([a-z])/\. \u$1/g;
print "$_\n";
}
请注意,要使转换后的脚本生效,您需要在替代操作中使用/gi
作为修饰符(而不是/g
)。这段代码还有很大的改进空间。
测试正在发生的事情的一种基本方法是在每一步打印所有内容。
while (<INPUT_FILE>) {
print "## $_";
my $currentLine = $_;
# Remove empty lines and lines that start with digits
if ($currentLine =~ /^[\s+|\d+]/){
print "#SKIP# $currentLine";
next;
}
# Remove all carriage returns
$currentLine =~ s/\R$/ /;
print "#EOL# $currentLine##\n";
# Convert all letters to lower case
$currentLine =~ s/([A-Z])/\l$1/g;
print "#LC# $currentLine##\n";
# Capitalize after period <= STEP THAT DOES NOT WORK
$currentLine =~ s/\.\s([a-z])/\. \u$1/g;
print "#CAPS# $currentLine##\n";
print $currentLine; # Needs a newline!
}
这会告诉你发生了什么,然后出错了。请注意,用空格替换通用EOL(\R
)意味着输出不以换行结束。这也是一个坏主意 - 这就是为什么我生成的输出以换行结束;要么从文件中读取,要么在删除之后添加一个。
此外,您应该避免使用ALL_CAPS文件句柄并使用词法句柄 - 当您需要显式文件句柄时。
open my $fh, '<', $filename
or die "Couldn't open $filename for reading!";
在错误消息中包含文件名的好工作(虽然添加$!
来报告系统错误消息也是个好主意。)
答案 2 :(得分:-1)
# (char)(char)(char) (char)(char)(char) Uppercase the 3rd
$str =~ s/(\.)(\s)(\w)/$1$2\U$3/g;
print $str
...so, that's our art. the 4 of us can now have a dialog. we can have a conversation. we can speak to...
...so, that's our art. The 4 of us can now have a dialog. We can have a conversation. We can speak to...