Question

我正在尝试学习正则表达式。我知道基础知识，而且我在正则表达式上并不可怕，我只是不赞成 - 所以我对你们有一个问题。如果你知道正则表达式，我敢打赌它会很简单。

我目前得到的是：

/(\w+)\s-{1}\s(\w+)\.{1}(\w{3,4})/

我要做的是为自己创建一个小脚本，通过格式化所有文件名来整理我的音乐集。我知道那里还有其他的东西，但这对我来说是一次学习经历。我已经把所有的头衔搞砸了，比如“Hell Aint A Bad Place To Be”和“Hell Aint a Bad Place To Be”。根据我的智慧，我不知何故最终得到了“Hell Aint a ad place to be”（我正在寻找A后跟空格和大写字符）。显然这是一个需要修复的噩梦，必须手动完成。毋庸置疑，我现在首先测试样品。

无论如何，上面的正则表达式是很多人的第一阶段。最终我想建立它，但是现在我只需要让简单的位工作。

最后我想转：

"arctic Monkeys- a fake tales of a san francisco"

到

"Arctic Monkeys - A Fake Tales of a San Francisco"

我知道当你在' - '之后我需要看后悔的断言，因为如果第一个单词是'a'，'of'等我通常是小写的，我需要大写它们（以上是我所知道的这个用例的坏例子。）

修复现有正则表达式的任何方法都很棒，并且提示关于在我的备忘单上查看完成休息的地方会很棒（我不是在寻找一个完整的 - 成熟的答案，因为我需要学会自己做，我只是想不出为什么w +只得到一个字。）

Answer 1

\ w不包含空白。工作正则表达式可能是：

/^(.+?)\s*-\s*(.+)$/

说明：

^     - must start at the beginning of the string
(.+?) - match any character, be ungreedy
\s*   - match any number whitespace that might exists (including none)
-     - match character
\s*   - any whitespace again
(.+)  - remaining characters
$     - end of string

转码将在另一个替换正则表达式中发生。

Answer 2

我对你正在做的事情感到有些困惑，但也许这会有所帮助。请记住，+是1个或更多字符，*是0或更多。所以你可能想做（[\ s] *）之类的东西来匹配空格。您无需在单个字符旁边指定{1}。

所以也许是这样的：

([\w\s]+)([\s]*)-([\s]*)([\w\s]+)\.([\w]{3,4})

我没有测试过这段代码，但我认为你明白了。

Answer 3

我相信有一种更简单的方法可以解决这个问题：根据更简单的正则表达式将字符串拆分为单词，然后将所需的处理应用于这些单词。这将允许您以更清晰的方式对文本执行更复杂的转换。这是一个例子：

<?php

$song = "arctic Monkeys- a fake tales of a san francisco";

// Split on spaces or - (the - is still present
// because it's only a lookahead match)
$words = preg_split("/([\s]+|(?=-))/", $song);

/*
Output for print_r:
Array
(
    [0] => arctic
    [1] => Monkeys
    [2] => -
    [3] => a
    [4] => fake
    [5] => tales
    [6] => of
    [7] => a
    [8] => san
    [9] => francisco
)
*/
print_r($words);

$new_words = array();
foreach ($words as $k => $word) {
        $new_words[] = processWord($word, $k, $words);
}

// This will output:
// Arctic Monkeys - A Fake Tales of a San Francisco
echo implode(' ', $new_words);

// You can add as many processing rules you want in here - in a very clean way
function processWord($word, $idx, $words) {
        if ($words[$idx - 1] == '-') return ucfirst($word);
        return strlen($word) > 2 ? ucfirst($word) : $word;
}

以下是此代码运行的示例：http://codepad.org/t6pc8WpR

Answer 4

对于第一部分，\ w与单词匹配，它与单词字符匹配。它相当于[A-Za-z0-9 _]。

相反，尝试（[A-Za-z0-9_] +）作为您的第一位（在匹配方括号内有一个额外的空格并删除\ s。

Answer 5

这就是我所拥有的：

<?php
/**
 * Formats a string into a title:
 * * Pads all dashes with spaces.
 * * Uppercase all words with 3 letters or more.
 * * Uppercase first word and first words after dashes.
 *
 * @param $str
 *
 * @return string
 */
function format_title($str) {
    //Remove all spaces before and after dashes.
    //(These will return in the final product)
    $str = preg_replace("/\s?-\s?/", "-", $str);

    //Explode by dash.
    $string_split_by_dash = explode("-", $str);
    //For each sentence (separated by dashes)
    foreach ($string_split_by_dash as &$sentence) {
        //Uppercase all words.
        $sentence = ucwords($sentence);
        //Explode into words (by space)
        $words = explode(" ", $sentence);
        //For each word
        foreach ($words as &$word) {
            //If its length is smaller than 3
            if (strlen($word) < 3) {
                //Lowercase it.
                $word = strtolower($word);
            }
        }
        //Implode back into a sentence.
        $sentence = implode(" ", $words);
        //Uppercase the first word, regardless of length.
        $sentence = ucfirst($sentence);
    }

    //Implode all sentances back by space-padded dash.
    $str = implode(" - ", $string_split_by_dash);

    return $str;
}

$str = "arctic Monkeys- a fake tales of a san francisco";
var_dump(format_title($str));

我认为它比正则表达式更具可读性（并且更具可记录性）。也许效率更高，（没有检查）。

为什么这个正则表达式只捕获一个单词？

5 个答案: