在句点和空格之后将所有字母大写

时间:2016-01-03 03:53:23

标签: regex perl

我试图在使用Perl的句点和空格之后将所有出现的小写字母大写。这是一个输入的例子:

...so, that's our art. the 4 of us can now have a dialog. we can have a conversation. we can speak to...

这是我想看到的输出:

...so, that's our art. The 4 of us can now have a dialog. We can have a conversation. We can speak to...    

我尝试了多个正则表达式但没有取得多大成功 - 例如:

$currentLine =~ s/\.\s([a-z])/\. \u$1/g;

$currentLine =~ s/([\.!?]\s*)(\w)/$1\U$2/g;

但我没有得到预期的结果。求救!

更新

正如有人所指出的那样,提供背景的问题可能在于其他地方。正则表达式在这个小脚本的上下文中使用,它除了发布这篇文章的步骤之外还做了一些事情。我在从视频隐藏字幕获得的长SRT文件上运行它。再次感谢您的帮助。

#! perl
use strict;
use warnings;

my $filename = $ARGV[0];

open(INPUT_FILE, $filename)
    or die "Couldn't open $filename for reading!";
while (<INPUT_FILE>) {
        my $currentLine = $_;   
        # Remove empty lines and lines that start with digits
        if ($currentLine =~ /^[\s+|\d+]/){
            next;
        }

        # Remove all carriage returns
        $currentLine =~ s/\R$/ /;

        # Convert all letters to lower case
        $currentLine =~ s/([A-Z])/\l$1/g;

        # Capitalize after period <= STEP THAT DOES NOT WORK
        $currentLine =~ s/\.\s([a-z])/\. \u$1/g;        

        print $currentLine;
}
close(INPUT_FILE);

3 个答案:

答案 0 :(得分:3)

试试这个

使用look look并捕获模式并使用\U将字符串的开头更改为大写

$str ="...so, that's our art. the 4 of us can now have a dialog. we can have a conversation. we can speak to...";
$str =~ s/(?<=\w\.\s)(\w)/\U$1/g;
print $str

或者尝试\K通过替换保留单词。

$str =~ s/\w\.\s\K(\w)/\U$1/g;

答案 1 :(得分:0)

一个问题是代码:

    if ($currentLine =~ /^[\s+|\d+]/){
        next;
    }

与注释相反,这会忽略以空格,数字,加号或管道符号开头的行。这可能会让你走错了路。你可能想写:

    next if /^(\s+$|\d)/;

如果整行是空格,或者第一个字符是数字,则跳过一行。

您可以使用以下内容简化循环并对其进行概括:

#!/usr/bin/env perl
use strict;
use warnings;

while (<>) {
        # Remove empty lines and lines that start with digits. sometimes
        next if /^(\s+$|\d)/;

        # Remove all carriage returns. forever
        s/\R$//;

        # Convert all letters to lower case. always
        s/([A-Z])/\l$1/g;

        # Capitalize after period <=... STEP THAT DOES NOT WORK
        s/\.\s([a-z])/\. \u$1/g;

        print "$_\n";
}

当自身运行时,输出为:

#!/usr/bin/env perl
use strict;
use warnings;
while (<>) {
        # remove empty lines and lines that start with digits. Sometimes
        next if /^(\s+$|\d)/;
        # remove all carriage returns. Forever
        s/\r$//;
        # convert all letters to lower case. Always
        s/([a-z])/\l$1/g;
        # capitalize after period <=... Step that does not work
        s/\.\s([a-z])/\. \u$1/g;
        print "$_\n";
}

请注意,要使转换后的脚本生效,您需要在替代操作中使用/gi作为修饰符(而不是/g)。这段代码还有很大的改进空间。

测试正在发生的事情的一种基本方法是在每一步打印所有内容。

while (<INPUT_FILE>) {
        print "## $_";
        my $currentLine = $_;   
        # Remove empty lines and lines that start with digits
        if ($currentLine =~ /^[\s+|\d+]/){
            print "#SKIP# $currentLine";
            next;
        }

        # Remove all carriage returns
        $currentLine =~ s/\R$/ /;
        print "#EOL# $currentLine##\n";

        # Convert all letters to lower case
        $currentLine =~ s/([A-Z])/\l$1/g;
        print "#LC# $currentLine##\n";

        # Capitalize after period <= STEP THAT DOES NOT WORK
        $currentLine =~ s/\.\s([a-z])/\. \u$1/g;        
        print "#CAPS# $currentLine##\n";

        print $currentLine;    # Needs a newline!
}

这会告诉你发生了什么,然后出错了。请注意,用空格替换通用EOL(\R)意味着输出不以换行结束。这也是一个坏主意 - 这就是为什么我生成的输出以换行结束;要么从文件中读取,要么在删除之后添加一个。

此外,您应该避免使用ALL_CAPS文件句柄并使用词法句柄 - 当您需要显式文件句柄时。

open my $fh, '<', $filename
    or die "Couldn't open $filename for reading!";

在错误消息中包含文件名的好工作(虽然添加$!来报告系统错误消息也是个好主意。)

答案 2 :(得分:-1)

# (char)(char)(char)  (char)(char)(char) Uppercase the 3rd
$str =~ s/(\.)(\s)(\w)/$1$2\U$3/g;
print $str

...so, that's our art. the 4 of us can now have a dialog. we can have a conversation. we can speak to...
...so, that's our art. The 4 of us can now have a dialog. We can have a conversation. We can speak to...